R Markdown

This is an authentic homework that is related to applying knn, tree based learners, reinforcement trees and gradient boosting methods on the 5 datasets I have selected. Following sections describe the approach and advantages, disadvantages of the algorithms for each of the datasets. First, we begin by installing the required packages. For each dataset, learners described above is applied and then results are discussed. On each dataset, stratified sampling is performed and a split of 0.7-0.3 is done respectively for training and test data. The goodness of an estimator is evaluated based on confusion matrix for the classification problems and root mean square error for regression problems.

##Mushroom Dataset This dataset is about random instances of mushrooms collected. Each instance consists of 22 categorical variables that are related to characteristics of mushrooms. Target variable(p) is whether a mushroom is poisonous or not. All features are categorical and there are some features that are not binary as opposed to the target variable. I might have transformed a variable taking 6 nominal values as 6(or 5) categoric binary variables however I did not transform the feature but instead I have written a knn function of my own, and efficiently calculated the distance matrix at first, i.e. distance between each test instance and training instance. The distance metric is the sum of the agreeing features between two instances except the target variable.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
require(data.table, quietly = TRUE)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
require(skimr)
## Loading required package: skimr
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'skimr'
require(ggcorrplot)
## Loading required package: ggcorrplot
## Loading required package: ggplot2
require(GGally)
## Loading required package: GGally
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'GGally'
require(ggplot2)
library(neighbr)
library(ggplot2)
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
library(caret)
## Loading required package: lattice
library(rpart.plot)
## Loading required package: rpart
library(rpart)
library(caTools)
library(gbm)
## Loaded gbm 2.1.8.1
setwd("/Users/halis/Desktop")
data <- data.table(read.csv("agaricus-lepiota.data",stringsAsFactors=T))
matr <- matrix(0,nrow=dim(data)[1],ncol=dim(data)[2])
for(i in 1:dim(data)[2]){
  y <- unique(data[[i]])
  print(i)
  print(y)
  for(j in 1:length(y)){
    z <- data[[i]]==y[j]
    matr[z,i] = y[j]
  }
}
## [1] 1
## [1] e p
## Levels: e p
## [1] 2
## [1] x b s f k c
## Levels: b c f k s x
## [1] 3
## [1] s y f g
## Levels: f g s y
## [1] 4
##  [1] y w g n e p b u c r
## Levels: b c e g n p r u w y
## [1] 5
## [1] t f
## Levels: f t
## [1] 6
## [1] a l p n f c y s m
## Levels: a c f l m n p s y
## [1] 7
## [1] f a
## Levels: a f
## [1] 8
## [1] c w
## Levels: c w
## [1] 9
## [1] b n
## Levels: b n
## [1] 10
##  [1] k n g p w h u e b r y o
## Levels: b e g h k n o p r u w y
## [1] 11
## [1] e t
## Levels: e t
## [1] 12
## [1] c e b r ?
## Levels: ? b c e r
## [1] 13
## [1] s f k y
## Levels: f k s y
## [1] 14
## [1] s f y k
## Levels: f k s y
## [1] 15
## [1] w g p n b e o c y
## Levels: b c e g n o p w y
## [1] 16
## [1] w p g b n e y o c
## Levels: b c e g n o p w y
## [1] 17
## [1] p
## Levels: p
## [1] 18
## [1] w n o y
## Levels: n o w y
## [1] 19
## [1] o t n
## Levels: n o t
## [1] 20
## [1] p e l f n
## Levels: e f l n p
## [1] 21
## [1] n k u h w r o y b
## Levels: b h k n o r u w y
## [1] 22
## [1] n s a v y c
## Levels: a c n s v y
## [1] 23
## [1] g m u d p w l
## Levels: d g l m p u w
indicez <- which(data[[12]]=="?") #values that I need to predict for variable 12
req_imp <- matr[indicez,]
util_imp <- matr[-indicez,]

set.seed(582) #using caTools performing stratified sampling
split=sample.split(data$p, SplitRatio=0.7)
dataTR=subset(data,split==TRUE)
dataTE=subset(data,split==FALSE)
sum(dataTE$p=="e")/length(dataTE$p) #checking stratification on test data
## [1] 0.5178498
sum(dataTR$p=="e")/length(dataTR$p) #checking stratification on training data
## [1] 0.5181147
str(dataTR)
## Classes 'data.table' and 'data.frame':   5686 obs. of  23 variables:
##  $ p  : Factor w/ 2 levels "e","p": 1 2 1 1 1 1 2 1 1 1 ...
##  $ x  : Factor w/ 6 levels "b","c","f","k",..: 1 6 6 6 1 1 6 1 6 6 ...
##  $ s  : Factor w/ 4 levels "f","g","s","y": 3 4 3 4 3 4 4 3 4 4 ...
##  $ n  : Factor w/ 10 levels "b","c","e","g",..: 9 9 4 10 9 9 9 10 10 10 ...
##  $ t  : Factor w/ 2 levels "f","t": 2 2 1 2 2 2 2 2 2 2 ...
##  $ p.1: Factor w/ 9 levels "a","c","f","l",..: 4 7 6 1 1 4 7 1 4 1 ...
##  $ f  : Factor w/ 2 levels "a","f": 2 2 2 2 2 2 2 2 2 2 ...
##  $ c  : Factor w/ 2 levels "c","w": 1 1 2 1 1 1 1 1 1 1 ...
##  $ n.1: Factor w/ 2 levels "b","n": 1 2 1 1 1 1 2 1 1 1 ...
##  $ k  : Factor w/ 12 levels "b","e","g","h",..: 6 6 5 6 3 6 8 3 3 6 ...
##  $ e  : Factor w/ 2 levels "e","t": 1 1 2 1 1 1 1 1 1 1 ...
##  $ e.1: Factor w/ 5 levels "?","b","c","e",..: 3 4 4 3 3 3 4 3 3 3 ...
##  $ s.1: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ s.2: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ w  : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ w.1: Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ p.2: Factor w/ 1 level "p": 1 1 1 1 1 1 1 1 1 1 ...
##  $ w.2: Factor w/ 4 levels "n","o","w","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ o  : Factor w/ 3 levels "n","o","t": 2 2 2 2 2 2 2 2 2 2 ...
##  $ p.3: Factor w/ 5 levels "e","f","l","n",..: 5 5 1 5 5 5 5 5 5 5 ...
##  $ k.1: Factor w/ 9 levels "b","h","k","n",..: 4 3 4 3 3 4 3 3 4 3 ...
##  $ s.3: Factor w/ 6 levels "a","c","n","s",..: 3 4 1 3 3 4 5 4 3 4 ...
##  $ u  : Factor w/ 7 levels "d","g","l","m",..: 4 6 2 2 4 4 2 4 2 4 ...
##  - attr(*, ".internal.selfref")=<externalptr>
knnfnc <- function(l,matr,k,temat){
  #l ... which index of the matr to pick as target
  #matr ... data matrix for training
  #k ... neighbor count
  #temat ... test matrix
  
  n1 <- dim(temat)[1] #number of data points for test
  n2 <- dim(matr)[1] #number of data points for training
  dataX <- matr[,-l]
  dataY <- temat[,-l]
  distmatr <- matrix(0,nrow=n1,ncol=n2) #distance matrix of dimension (n1,n2) n1 representing number of instances on test n2 representing number of instances on training, i.e. entry(i,j) depicting distance of the test instance i from training instance j
  m <- dim(matr)[2]-1 #number of features available
  for(i in 1:n1){
    distmatr[i,] <- rowSums(matrix(dataY[i,],nrow=n2,ncol=m,byrow=T)==dataX) #we check the amount of agreeing features
  }
  distmatr <- m - distmatr #we transform it to a distance matrix
  matrind <- matrix(0,nrow=n1,ncol=k) #indices for closest points
  for(i in 1:n1){
    vec1 <- distmatr[i,] #vec assignment
    ind <- c() #indice vector
    for(j in 1:k){
      y <- which.min(vec1) #min distance index
      vec1[y] <- m #update it for not choosing it again
      ind <- c(ind,y) #index vector update
    }
    matrind[i,] <- ind #aggregate on certain row
  }
  valuemat <- matrix(matr[,l][t(matrind)],nrow=n1,ncol=k,byrow=T) #target value recording
  outpvec <- rep(0,n1)
  for(i in 1:n1){#majority voting
    outpvec[i] <- as.numeric(names(which.max(table(valuemat[i,])))[1])
  }
  outpvec
}

Now we will first do imputation of the missing data for the 12th column using all the observations which may create some bias since we have included the whole data for performing imputation but I deemed this way to be more acceptable since missing data is randomly distributed, i.e. also included in the test set, and this way imputation is more accurate. Let us do imputation and then predictions now.

indicez <- which(data[[12]]=="?") #values that I need to predict for variable 12
req_imp <- matr[indicez,]
util_imp <- matr[-indicez,]
matr <- util_imp
temat <- req_imp
l <- 12
k <- 3
matr[1:5,]
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]    1    6    3   10    2    1    2    1    1     5     1     3     3     3
## [2,]    1    1    3    9    2    4    2    1    1     6     1     3     3     3
## [3,]    2    6    4    9    2    7    2    1    2     6     1     4     3     3
## [4,]    1    6    3    4    1    6    2    2    1     5     2     4     3     3
## [5,]    1    6    4   10    2    1    2    1    1     6     1     3     3     3
##      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
## [1,]     8     8     1     3     2     5     4     3     2
## [2,]     8     8     1     3     2     5     4     3     4
## [3,]     8     8     1     3     2     5     3     4     6
## [4,]     8     8     1     3     2     1     4     1     2
## [5,]     8     8     1     3     2     5     3     3     2
p <- knnfnc(l,matr,k,temat)
table(p)
## p
##    2    3    4 
## 1702   98  680
ind1 <- which(p==2)
ind2 <- which(p==3)
ind3 <- which(p==4)
data[[12]][indicez][ind1] <- "b"
data[[12]][indicez][ind2] <- "c"
data[[12]][indicez][ind3] <- "e"
set.seed(582) #using caTools performing stratified sampling
split=sample.split(data$p, SplitRatio=0.7)
dataTR=subset(data,split==TRUE)
dataTE=subset(data,split==FALSE)
sum(dataTE$p=="e")/length(dataTE$p)
## [1] 0.5178498
sum(dataTR$p=="e")/length(dataTR$p)
## [1] 0.5181147
str(dataTR)
## Classes 'data.table' and 'data.frame':   5686 obs. of  23 variables:
##  $ p  : Factor w/ 2 levels "e","p": 1 2 1 1 1 1 2 1 1 1 ...
##  $ x  : Factor w/ 6 levels "b","c","f","k",..: 1 6 6 6 1 1 6 1 6 6 ...
##  $ s  : Factor w/ 4 levels "f","g","s","y": 3 4 3 4 3 4 4 3 4 4 ...
##  $ n  : Factor w/ 10 levels "b","c","e","g",..: 9 9 4 10 9 9 9 10 10 10 ...
##  $ t  : Factor w/ 2 levels "f","t": 2 2 1 2 2 2 2 2 2 2 ...
##  $ p.1: Factor w/ 9 levels "a","c","f","l",..: 4 7 6 1 1 4 7 1 4 1 ...
##  $ f  : Factor w/ 2 levels "a","f": 2 2 2 2 2 2 2 2 2 2 ...
##  $ c  : Factor w/ 2 levels "c","w": 1 1 2 1 1 1 1 1 1 1 ...
##  $ n.1: Factor w/ 2 levels "b","n": 1 2 1 1 1 1 2 1 1 1 ...
##  $ k  : Factor w/ 12 levels "b","e","g","h",..: 6 6 5 6 3 6 8 3 3 6 ...
##  $ e  : Factor w/ 2 levels "e","t": 1 1 2 1 1 1 1 1 1 1 ...
##  $ e.1: Factor w/ 5 levels "?","b","c","e",..: 3 4 4 3 3 3 4 3 3 3 ...
##  $ s.1: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ s.2: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ w  : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ w.1: Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ p.2: Factor w/ 1 level "p": 1 1 1 1 1 1 1 1 1 1 ...
##  $ w.2: Factor w/ 4 levels "n","o","w","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ o  : Factor w/ 3 levels "n","o","t": 2 2 2 2 2 2 2 2 2 2 ...
##  $ p.3: Factor w/ 5 levels "e","f","l","n",..: 5 5 1 5 5 5 5 5 5 5 ...
##  $ k.1: Factor w/ 9 levels "b","h","k","n",..: 4 3 4 3 3 4 3 3 4 3 ...
##  $ s.3: Factor w/ 6 levels "a","c","n","s",..: 3 4 1 3 3 4 5 4 3 4 ...
##  $ u  : Factor w/ 7 levels "d","g","l","m",..: 4 6 2 2 4 4 2 4 2 4 ...
##  - attr(*, ".internal.selfref")=<externalptr>
matrgenerator <- function(data){
  matr <- matrix(0,nrow=dim(data)[1],ncol=dim(data)[2])
  for(i in 1:dim(data)[2]){
    y <- unique(data[[i]])
    print(i)
    print(y)
    for(j in 1:length(y)){
      z <- data[[i]]==y[j]
      matr[z,i] = y[j]
    }
  }
  matr
}


matr <- matrgenerator(dataTR)
## [1] 1
## [1] e p
## Levels: e p
## [1] 2
## [1] b x s f k c
## Levels: b c f k s x
## [1] 3
## [1] s y f g
## Levels: f g s y
## [1] 4
##  [1] w g y n e p b u r c
## Levels: b c e g n p r u w y
## [1] 5
## [1] t f
## Levels: f t
## [1] 6
## [1] l p n a f c y s m
## Levels: a c f l m n p s y
## [1] 7
## [1] f a
## Levels: a f
## [1] 8
## [1] c w
## Levels: c w
## [1] 9
## [1] b n
## Levels: b n
## [1] 10
##  [1] n k g p w h u e b r o y
## Levels: b e g h k n o p r u w y
## [1] 11
## [1] e t
## Levels: e t
## [1] 12
## [1] c e b r
## Levels: ? b c e r
## [1] 13
## [1] s f k y
## Levels: f k s y
## [1] 14
## [1] s f y k
## Levels: f k s y
## [1] 15
## [1] w g p b n e o c y
## Levels: b c e g n o p w y
## [1] 16
## [1] w p g b n e y o c
## Levels: b c e g n o p w y
## [1] 17
## [1] p
## Levels: p
## [1] 18
## [1] w o n y
## Levels: n o w y
## [1] 19
## [1] o t n
## Levels: n o t
## [1] 20
## [1] p e l f n
## Levels: e f l n p
## [1] 21
## [1] n k u h w r o y b
## Levels: b h k n o r u w y
## [1] 22
## [1] n s a v y c
## Levels: a c n s v y
## [1] 23
## [1] m u g d p w l
## Levels: d g l m p u w
temat <- matrgenerator(dataTE)
## [1] 1
## [1] e p
## Levels: e p
## [1] 2
## [1] x b f s k c
## Levels: b c f k s x
## [1] 3
## [1] s y f
## Levels: f g s y
## [1] 4
##  [1] y w n g e p c b u r
## Levels: b c e g n p r u w y
## [1] 5
## [1] t f
## Levels: f t
## [1] 6
## [1] a p l n f c y s m
## Levels: a c f l m n p s y
## [1] 7
## [1] f a
## Levels: a f
## [1] 8
## [1] c w
## Levels: c w
## [1] 9
## [1] b n
## Levels: b n
## [1] 10
##  [1] k w n p g h u b e r y o
## Levels: b e g h k n o p r u w y
## [1] 11
## [1] e t
## Levels: e t
## [1] 12
## [1] c e r b
## Levels: ? b c e r
## [1] 13
## [1] s f k y
## Levels: f k s y
## [1] 14
## [1] s y f k
## Levels: f k s y
## [1] 15
## [1] w g p n b e o c y
## Levels: b c e g n o p w y
## [1] 16
## [1] w p g b n e y o c
## Levels: b c e g n o p w y
## [1] 17
## [1] p
## Levels: p
## [1] 18
## [1] w n o y
## Levels: n o w y
## [1] 19
## [1] o t n
## Levels: n o t
## [1] 20
## [1] p e l f n
## Levels: e f l n p
## [1] 21
## [1] n k u h w r o y b
## Levels: b h k n o r u w y
## [1] 22
## [1] n s v y a c
## Levels: a c n s v y
## [1] 23
## [1] g u m p d l w
## Levels: d g l m p u w
l <- 1
k <- 3
matr[1:5,]
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]    1    1    3    9    2    4    2    1    1     6     1     3     3     3
## [2,]    2    6    4    9    2    7    2    1    2     6     1     4     3     3
## [3,]    1    6    3    4    1    6    2    2    1     5     2     4     3     3
## [4,]    1    6    4   10    2    1    2    1    1     6     1     3     3     3
## [5,]    1    1    3    9    2    1    2    1    1     3     1     3     3     3
##      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
## [1,]     8     8     1     3     2     5     4     3     4
## [2,]     8     8     1     3     2     5     3     4     6
## [3,]     8     8     1     3     2     1     4     1     2
## [4,]     8     8     1     3     2     5     3     3     2
## [5,]     8     8     1     3     2     5     3     3     4
p <- knnfnc(l,matr,k,temat)
sum(p-temat[,l]) #It seems we have correctly predicted all the cases
## [1] 0
sum(temat[,1][-ind1]-p[-ind1]) #It checks out, we have correctly predicted all the cases using knn
## [1] 0
p
##    [1] 1 1 2 2 2 1 2 1 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1
##   [38] 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1
##   [75] 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1
##  [112] 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [149] 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
##  [186] 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2
##  [223] 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1
##  [260] 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1
##  [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1
##  [334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1
##  [371] 1 1 1 1 1 2 1 1 1 2 2 2 2 1 1 1 1 1 2 1 1 1 2 2 1 1 1 1 1 1 1 2 2 1 1 1 1
##  [408] 1 1 2 1 1 1 1 1 1 2 1 1 1 2 1 1 2 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1
##  [445] 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1
##  [482] 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
##  [519] 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 2
##  [556] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1
##  [593] 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1
##  [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [704] 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1
##  [741] 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1
##  [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [815] 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
##  [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1
##  [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 2 2 2 1 1 2
##  [926] 1 1 1 1 2 2 2 2 2 1 2 2 1 1 1 1 1 1 1 2 1 2 1 2 1 2 2 2 1 2 2 1 2 1 1 1 2
##  [963] 2 1 2 2 1 1 2 1 2 1 1 1 1 1 1 1 2 2 1 2 2 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1
## [1000] 1 1 2 1 1 1 2 1 2 2 1 2 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1
## [1037] 1 1 2 2 1 2 1 2 1 1 1 2 1 1 1 2 1 2 2 1 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1 2 1
## [1074] 2 1 2 1 2 1 1 1 2 2 1 1 2 1 2 1 1 1 2 1 2 1 1 2 1 1 2 1 1 1 2 1 2 1 2 1 1
## [1111] 1 1 1 2 1 2 1 2 1 1 2 1 1 1 1 1 2 1 1 2 1 2 2 1 1 1 1 1 2 2 1 2 1 1 1 1 1
## [1148] 2 2 1 2 1 2 1 2 2 2 1 2 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 2 1 2
## [1185] 2 1 2 2 2 1 2 1 2 2 2 1 2 2 1 1 2 2 2 1 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2
## [1222] 2 2 2 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 1 2 2 1 2 1 2 2 1 2 2 1 2 2 2 1 2 2 1
## [1259] 2 2 2 1 2 2 2 2 2 2 1 2 1 1 1 2 1 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 2 2
## [1296] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1333] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 1 2 2 1 2
## [1370] 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2
## [1407] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2
## [1444] 2 2 2 1 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2
## [1481] 1 2 1 2 1 1 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 1 2 2 2 2 2 1 1
## [1518] 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2
## [1555] 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 1 2 2 2
## [1592] 2 2 2 1 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 2 2 2 2 2
## [1629] 2 1 2 2 2 2 1 2 2 2 2 1 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1666] 2 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 1
## [1703] 1 2 2 2 2 2 2 2 1 2 2 1 1 2 2 2 2 1 1 2 2 2 2 2 2 1 1 2 2 1 2 2 2 2 1 2 2
## [1740] 1 2 1 2 2 2 2 2 2 2 2 2 1 1 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 2 2 2 2
## [1777] 1 2 2 1 1 2 2 2 1 2 2 2 1 2 2 1 2 2 2 1 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2
## [1814] 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1851] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1888] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2
## [1925] 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1962] 2 2 2 2 1 2 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1999] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [2036] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2
## [2073] 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 1 2 2 1 1 1
## [2110] 2 1 1 2 2 1 2 1 1 2 2 1 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2
## [2147] 2 2 1 1 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 1 2 1 2 2 2 1 2 1 1 1 1 2 2 1 2 1 1
## [2184] 1 2 2 1 2 2 2 2 1 2 1 2 2 1 2 1 2 1 2 1 1 2 2 1 2 2 1 1 1 1 2 1 2 2 1 1 1
## [2221] 2 2 2 1 1 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 1 1 2 1 1 2 2 1 2 2 2 2 2 2 1
## [2258] 2 1 1 2 1 2 2 2 2 1 2 1 2 1 2 1 1 2 1 1 1 2 2 1 2 2 1 2 1 2 2 1 2 2 1 2 1
## [2295] 2 1 2 2 1 2 2 1 1 2 2 1 2 2 1 1 1 2 2 2 2 1 2 2 2 2 1 2 1 2 2 1 2 2 2 2 1
## [2332] 2 1 2 2 1 2 1 2 1 1 1 2 2 2 1 2 1 2 2 2 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 2 2
## [2369] 1 2 2 2 2 1 2 2 2 2 2 2 1 2 2 2 1 2 1 1 2 2 2 2 2 1 1 1 2 2 2 1 1 1 2 1 1
## [2406] 2 1 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1
table(p)
## p
##    1    2 
## 1262 1175
ind1 <- which(p==1)

k <- 5
p <- knnfnc(l,matr,k,temat) #for k=5 we also correctly predicted all the cases
sum(p-temat[,l]) #It seems we have correctly predicted all the cases
## [1] 0
sum(temat[,1][-ind1]-p[-ind1]) #It checks out, we have correctly predicted all the cases using knn
## [1] 0
p
##    [1] 1 1 2 2 2 1 2 1 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1
##   [38] 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1
##   [75] 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1
##  [112] 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [149] 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
##  [186] 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2
##  [223] 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1
##  [260] 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1
##  [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1
##  [334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1
##  [371] 1 1 1 1 1 2 1 1 1 2 2 2 2 1 1 1 1 1 2 1 1 1 2 2 1 1 1 1 1 1 1 2 2 1 1 1 1
##  [408] 1 1 2 1 1 1 1 1 1 2 1 1 1 2 1 1 2 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1
##  [445] 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1
##  [482] 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
##  [519] 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 2
##  [556] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1
##  [593] 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1
##  [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [704] 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1
##  [741] 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1
##  [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [815] 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
##  [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1
##  [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 2 2 2 1 1 2
##  [926] 1 1 1 1 2 2 2 2 2 1 2 2 1 1 1 1 1 1 1 2 1 2 1 2 1 2 2 2 1 2 2 1 2 1 1 1 2
##  [963] 2 1 2 2 1 1 2 1 2 1 1 1 1 1 1 1 2 2 1 2 2 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1
## [1000] 1 1 2 1 1 1 2 1 2 2 1 2 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1
## [1037] 1 1 2 2 1 2 1 2 1 1 1 2 1 1 1 2 1 2 2 1 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1 2 1
## [1074] 2 1 2 1 2 1 1 1 2 2 1 1 2 1 2 1 1 1 2 1 2 1 1 2 1 1 2 1 1 1 2 1 2 1 2 1 1
## [1111] 1 1 1 2 1 2 1 2 1 1 2 1 1 1 1 1 2 1 1 2 1 2 2 1 1 1 1 1 2 2 1 2 1 1 1 1 1
## [1148] 2 2 1 2 1 2 1 2 2 2 1 2 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 2 1 2
## [1185] 2 1 2 2 2 1 2 1 2 2 2 1 2 2 1 1 2 2 2 1 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2
## [1222] 2 2 2 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 1 2 2 1 2 1 2 2 1 2 2 1 2 2 2 1 2 2 1
## [1259] 2 2 2 1 2 2 2 2 2 2 1 2 1 1 1 2 1 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 2 2
## [1296] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1333] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 1 2 2 1 2
## [1370] 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2
## [1407] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2
## [1444] 2 2 2 1 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2
## [1481] 1 2 1 2 1 1 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 1 2 2 2 2 2 1 1
## [1518] 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2
## [1555] 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 1 2 2 2
## [1592] 2 2 2 1 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 1 1 2 2 1 2 2 2 2 2
## [1629] 2 1 2 2 2 2 1 2 2 2 2 1 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1666] 2 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 1
## [1703] 1 2 2 2 2 2 2 2 1 2 2 1 1 2 2 2 2 1 1 2 2 2 2 2 2 1 1 2 2 1 2 2 2 2 1 2 2
## [1740] 1 2 1 2 2 2 2 2 2 2 2 2 1 1 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 2 2 2 2
## [1777] 1 2 2 1 1 2 2 2 1 2 2 2 1 2 2 1 2 2 2 1 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2
## [1814] 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1851] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1888] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2
## [1925] 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1962] 2 2 2 2 1 2 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [1999] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [2036] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2
## [2073] 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 1 2 2 1 1 1
## [2110] 2 1 1 2 2 1 2 1 1 2 2 1 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2
## [2147] 2 2 1 1 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 1 2 1 2 2 2 1 2 1 1 1 1 2 2 1 2 1 1
## [2184] 1 2 2 1 2 2 2 2 1 2 1 2 2 1 2 1 2 1 2 1 1 2 2 1 2 2 1 1 1 1 2 1 2 2 1 1 1
## [2221] 2 2 2 1 1 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 1 1 2 1 1 2 2 1 2 2 2 2 2 2 1
## [2258] 2 1 1 2 1 2 2 2 2 1 2 1 2 1 2 1 1 2 1 1 1 2 2 1 2 2 1 2 1 2 2 1 2 2 1 2 1
## [2295] 2 1 2 2 1 2 2 1 1 2 2 1 2 2 1 1 1 2 2 2 2 1 2 2 2 2 1 2 1 2 2 1 2 2 2 2 1
## [2332] 2 1 2 2 1 2 1 2 1 1 1 2 2 2 1 2 1 2 2 2 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 2 2
## [2369] 1 2 2 2 2 1 2 2 2 2 2 2 1 2 2 2 1 2 1 1 2 2 2 2 2 1 1 1 2 2 2 1 1 1 2 1 1
## [2406] 2 1 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1
table(p)
## p
##    1    2 
## 1262 1175
ind1 <- which(p==1)

Let’s try tree based learning for the dataset. We do not need to perform imputation since trees handle imputation automatically.

data <- data.table(read.csv("agaricus-lepiota.data",stringsAsFactors=T))
set.seed(582) #using caTools performing stratified sampling
split=sample.split(data$p, SplitRatio=0.7)
dataTR=subset(data,split==TRUE)
dataTE=subset(data,split==FALSE)
str(dataTR)
## Classes 'data.table' and 'data.frame':   5686 obs. of  23 variables:
##  $ p  : Factor w/ 2 levels "e","p": 1 2 1 1 1 1 2 1 1 1 ...
##  $ x  : Factor w/ 6 levels "b","c","f","k",..: 1 6 6 6 1 1 6 1 6 6 ...
##  $ s  : Factor w/ 4 levels "f","g","s","y": 3 4 3 4 3 4 4 3 4 4 ...
##  $ n  : Factor w/ 10 levels "b","c","e","g",..: 9 9 4 10 9 9 9 10 10 10 ...
##  $ t  : Factor w/ 2 levels "f","t": 2 2 1 2 2 2 2 2 2 2 ...
##  $ p.1: Factor w/ 9 levels "a","c","f","l",..: 4 7 6 1 1 4 7 1 4 1 ...
##  $ f  : Factor w/ 2 levels "a","f": 2 2 2 2 2 2 2 2 2 2 ...
##  $ c  : Factor w/ 2 levels "c","w": 1 1 2 1 1 1 1 1 1 1 ...
##  $ n.1: Factor w/ 2 levels "b","n": 1 2 1 1 1 1 2 1 1 1 ...
##  $ k  : Factor w/ 12 levels "b","e","g","h",..: 6 6 5 6 3 6 8 3 3 6 ...
##  $ e  : Factor w/ 2 levels "e","t": 1 1 2 1 1 1 1 1 1 1 ...
##  $ e.1: Factor w/ 5 levels "?","b","c","e",..: 3 4 4 3 3 3 4 3 3 3 ...
##  $ s.1: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ s.2: Factor w/ 4 levels "f","k","s","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ w  : Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ w.1: Factor w/ 9 levels "b","c","e","g",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ p.2: Factor w/ 1 level "p": 1 1 1 1 1 1 1 1 1 1 ...
##  $ w.2: Factor w/ 4 levels "n","o","w","y": 3 3 3 3 3 3 3 3 3 3 ...
##  $ o  : Factor w/ 3 levels "n","o","t": 2 2 2 2 2 2 2 2 2 2 ...
##  $ p.3: Factor w/ 5 levels "e","f","l","n",..: 5 5 1 5 5 5 5 5 5 5 ...
##  $ k.1: Factor w/ 9 levels "b","h","k","n",..: 4 3 4 3 3 4 3 3 4 3 ...
##  $ s.3: Factor w/ 6 levels "a","c","n","s",..: 3 4 1 3 3 4 5 4 3 4 ...
##  $ u  : Factor w/ 7 levels "d","g","l","m",..: 4 6 2 2 4 4 2 4 2 4 ...
##  - attr(*, ".internal.selfref")=<externalptr>
#maxdepth=6,8,10,12,14 are evaluated

tree1=rpart(dataTR$p~.,method="class",data=dataTR,maxsurrogate = 5, usesurrogate = 1, maxdepth=14,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$p ~ ., data = dataTR, method = "class", 
##     maxsurrogate = 5, usesurrogate = 1, maxdepth = 14, xval = 10)
##   n= 5686 
## 
##           CP nsplit  rel error     xerror        xstd
## 1 0.96715328      0 1.00000000 1.00000000 0.013751107
## 2 0.02007299      1 0.03284672 0.03284672 0.003434837
## 3 0.01000000      2 0.01277372 0.01277372 0.002152498
## 
## Variable importance
## p.1 k.1   k s.1 s.2 p.3 
##  25  19  16  14  13  13 
## 
## Node number 1: 5686 observations,    complexity param=0.9671533
##   predicted class=e  expected loss=0.4818853  P(node) =1
##     class counts:  2946  2740
##    probabilities: 0.518 0.482 
##   left son=2 (3036 obs) right son=3 (2650 obs)
##   Primary splits:
##       p.1 splits as  LRRLRLRRR,    improve=2664.6040, (0 missing)
##       k.1 splits as  LRLLLRLRL,    improve=1533.8640, (0 missing)
##       k   splits as  RLRRLLLLRLLL, improve=1089.7760, (0 missing)
##       s.1 splits as  LRLL,         improve= 987.2070, (0 missing)
##       s.2 splits as  LRLL,         improve= 921.7207, (0 missing)
##   Surrogate splits:
##       k.1 splits as  LRLLLLLRL,    agree=0.861, adj=0.702, (0 split)
##       k   splits as  RLRRLLLLLLLL, agree=0.814, adj=0.601, (0 split)
##       s.1 splits as  LRLL,         agree=0.783, adj=0.535, (0 split)
##       s.2 splits as  LRLL,         agree=0.780, adj=0.529, (0 split)
##       p.3 splits as  RLRRL,        agree=0.774, adj=0.515, (0 split)
## 
## Node number 2: 3036 observations,    complexity param=0.02007299
##   predicted class=e  expected loss=0.02964427  P(node) =0.533943
##     class counts:  2946    90
##    probabilities: 0.970 0.030 
##   left son=4 (2981 obs) right son=5 (55 obs)
##   Primary splits:
##       k.1 splits as  LLLLLRLLL,    improve=105.485900, (0 missing)
##       k   splits as  -LLLLLLLRLLL, improve= 34.099420, (0 missing)
##       w.1 splits as  --LLLLLLR,    improve= 28.387960, (0 missing)
##       n   splits as  RLLLLRLLLL,   improve= 21.848150, (0 missing)
##       o   splits as  -LR,          improve=  9.771707, (0 missing)
##   Surrogate splits:
##       k splits as  -LLLLLLLRLLL, agree=0.988, adj=0.327, (0 split)
## 
## Node number 3: 2650 observations
##   predicted class=p  expected loss=0  P(node) =0.466057
##     class counts:     0  2650
##    probabilities: 0.000 1.000 
## 
## Node number 4: 2981 observations
##   predicted class=e  expected loss=0.01174103  P(node) =0.5242701
##     class counts:  2946    35
##    probabilities: 0.988 0.012 
## 
## Node number 5: 55 observations
##   predicted class=p  expected loss=0  P(node) =0.009672881
##     class counts:     0    55
##    probabilities: 0.000 1.000
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)
y <- predict(tree1,method="class",newdata=dataTE)

x <- (y[,1]>0.5) #we predicted some non-poisonous cases as poisonous
indices <- dataTE[[1]][x]=="e"
sum(x[indices])
## [1] 1258
sum(x[indices])/sum(indices)
## [1] 0.9968304
sum(x[indices])-sum(indices) #we predicted 4 non-poisonous cases as poisonous
## [1] -4
y <- predict(tree1,method="class",newdata=dataTE)
x <- (y[,1]<0.5)
indices <- dataTE[[1]][x]=="p" 
sum(x[indices])/sum(indices) #we predicted all poisonous cases correctly
## [1] 1
tree1=rpart(dataTR$p~.,method="class",data=dataTR,maxsurrogate = 5, usesurrogate = 1, maxdepth=12,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$p ~ ., data = dataTR, method = "class", 
##     maxsurrogate = 5, usesurrogate = 1, maxdepth = 12, xval = 10)
##   n= 5686 
## 
##           CP nsplit  rel error     xerror        xstd
## 1 0.96715328      0 1.00000000 1.00000000 0.013751107
## 2 0.02007299      1 0.03284672 0.03284672 0.003434837
## 3 0.01000000      2 0.01277372 0.01277372 0.002152498
## 
## Variable importance
## p.1 k.1   k s.1 s.2 p.3 
##  25  19  16  14  13  13 
## 
## Node number 1: 5686 observations,    complexity param=0.9671533
##   predicted class=e  expected loss=0.4818853  P(node) =1
##     class counts:  2946  2740
##    probabilities: 0.518 0.482 
##   left son=2 (3036 obs) right son=3 (2650 obs)
##   Primary splits:
##       p.1 splits as  LRRLRLRRR,    improve=2664.6040, (0 missing)
##       k.1 splits as  LRLLLRLRL,    improve=1533.8640, (0 missing)
##       k   splits as  RLRRLLLLRLLL, improve=1089.7760, (0 missing)
##       s.1 splits as  LRLL,         improve= 987.2070, (0 missing)
##       s.2 splits as  LRLL,         improve= 921.7207, (0 missing)
##   Surrogate splits:
##       k.1 splits as  LRLLLLLRL,    agree=0.861, adj=0.702, (0 split)
##       k   splits as  RLRRLLLLLLLL, agree=0.814, adj=0.601, (0 split)
##       s.1 splits as  LRLL,         agree=0.783, adj=0.535, (0 split)
##       s.2 splits as  LRLL,         agree=0.780, adj=0.529, (0 split)
##       p.3 splits as  RLRRL,        agree=0.774, adj=0.515, (0 split)
## 
## Node number 2: 3036 observations,    complexity param=0.02007299
##   predicted class=e  expected loss=0.02964427  P(node) =0.533943
##     class counts:  2946    90
##    probabilities: 0.970 0.030 
##   left son=4 (2981 obs) right son=5 (55 obs)
##   Primary splits:
##       k.1 splits as  LLLLLRLLL,    improve=105.485900, (0 missing)
##       k   splits as  -LLLLLLLRLLL, improve= 34.099420, (0 missing)
##       w.1 splits as  --LLLLLLR,    improve= 28.387960, (0 missing)
##       n   splits as  RLLLLRLLLL,   improve= 21.848150, (0 missing)
##       o   splits as  -LR,          improve=  9.771707, (0 missing)
##   Surrogate splits:
##       k splits as  -LLLLLLLRLLL, agree=0.988, adj=0.327, (0 split)
## 
## Node number 3: 2650 observations
##   predicted class=p  expected loss=0  P(node) =0.466057
##     class counts:     0  2650
##    probabilities: 0.000 1.000 
## 
## Node number 4: 2981 observations
##   predicted class=e  expected loss=0.01174103  P(node) =0.5242701
##     class counts:  2946    35
##    probabilities: 0.988 0.012 
## 
## Node number 5: 55 observations
##   predicted class=p  expected loss=0  P(node) =0.009672881
##     class counts:     0    55
##    probabilities: 0.000 1.000
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)
x <- (y[,1]>0.5) #we predicted 4 non-poisonous cases as poisonous
indices <- dataTE[[1]][x]=="e"
y <- sum(x[indices])
y/sum(indices)
## [1] 0.9968304
sum(indices)-y
## [1] 4
y <- predict(tree1,method="class",newdata=dataTE)
x <- (y[,1]<0.5)
indices <- dataTE[[1]][x]=="p"
y <- sum(x[indices])
y/sum(indices) #we predicted all poisonous cases correctly
## [1] 1
tree1=rpart(dataTR$p~.,method="class",data=dataTR,maxsurrogate = 5, usesurrogate = 1, maxdepth=10,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$p ~ ., data = dataTR, method = "class", 
##     maxsurrogate = 5, usesurrogate = 1, maxdepth = 10, xval = 10)
##   n= 5686 
## 
##           CP nsplit  rel error     xerror        xstd
## 1 0.96715328      0 1.00000000 1.00000000 0.013751107
## 2 0.02007299      1 0.03284672 0.03284672 0.003434837
## 3 0.01000000      2 0.01277372 0.01277372 0.002152498
## 
## Variable importance
## p.1 k.1   k s.1 s.2 p.3 
##  25  19  16  14  13  13 
## 
## Node number 1: 5686 observations,    complexity param=0.9671533
##   predicted class=e  expected loss=0.4818853  P(node) =1
##     class counts:  2946  2740
##    probabilities: 0.518 0.482 
##   left son=2 (3036 obs) right son=3 (2650 obs)
##   Primary splits:
##       p.1 splits as  LRRLRLRRR,    improve=2664.6040, (0 missing)
##       k.1 splits as  LRLLLRLRL,    improve=1533.8640, (0 missing)
##       k   splits as  RLRRLLLLRLLL, improve=1089.7760, (0 missing)
##       s.1 splits as  LRLL,         improve= 987.2070, (0 missing)
##       s.2 splits as  LRLL,         improve= 921.7207, (0 missing)
##   Surrogate splits:
##       k.1 splits as  LRLLLLLRL,    agree=0.861, adj=0.702, (0 split)
##       k   splits as  RLRRLLLLLLLL, agree=0.814, adj=0.601, (0 split)
##       s.1 splits as  LRLL,         agree=0.783, adj=0.535, (0 split)
##       s.2 splits as  LRLL,         agree=0.780, adj=0.529, (0 split)
##       p.3 splits as  RLRRL,        agree=0.774, adj=0.515, (0 split)
## 
## Node number 2: 3036 observations,    complexity param=0.02007299
##   predicted class=e  expected loss=0.02964427  P(node) =0.533943
##     class counts:  2946    90
##    probabilities: 0.970 0.030 
##   left son=4 (2981 obs) right son=5 (55 obs)
##   Primary splits:
##       k.1 splits as  LLLLLRLLL,    improve=105.485900, (0 missing)
##       k   splits as  -LLLLLLLRLLL, improve= 34.099420, (0 missing)
##       w.1 splits as  --LLLLLLR,    improve= 28.387960, (0 missing)
##       n   splits as  RLLLLRLLLL,   improve= 21.848150, (0 missing)
##       o   splits as  -LR,          improve=  9.771707, (0 missing)
##   Surrogate splits:
##       k splits as  -LLLLLLLRLLL, agree=0.988, adj=0.327, (0 split)
## 
## Node number 3: 2650 observations
##   predicted class=p  expected loss=0  P(node) =0.466057
##     class counts:     0  2650
##    probabilities: 0.000 1.000 
## 
## Node number 4: 2981 observations
##   predicted class=e  expected loss=0.01174103  P(node) =0.5242701
##     class counts:  2946    35
##    probabilities: 0.988 0.012 
## 
## Node number 5: 55 observations
##   predicted class=p  expected loss=0  P(node) =0.009672881
##     class counts:     0    55
##    probabilities: 0.000 1.000
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)
y <- predict(tree1,method="class",newdata=dataTE)

x <- (y[,1]>0.5) #we predicted some non-poisonous cases as poisonous
indices <- dataTE[[1]][x]=="e"
sum(x[indices])
## [1] 1258
sum(x[indices])/sum(indices)
## [1] 0.9968304
sum(x[indices])-sum(indices) #we predicted 4 non-poisonous cases as poisonous
## [1] -4
y <- predict(tree1,method="class",newdata=dataTE)
x <- (y[,1]<0.5)
indices <- dataTE[[1]][x]=="p" 
sum(x[indices])/sum(indices) #we predicted all poisonous cases correctly
## [1] 1
tree1=rpart(dataTR$p~.,method="class",data=dataTR,maxsurrogate = 5, usesurrogate = 1, maxdepth=8,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$p ~ ., data = dataTR, method = "class", 
##     maxsurrogate = 5, usesurrogate = 1, maxdepth = 8, xval = 10)
##   n= 5686 
## 
##           CP nsplit  rel error     xerror        xstd
## 1 0.96715328      0 1.00000000 1.00000000 0.013751107
## 2 0.02007299      1 0.03284672 0.03284672 0.003434837
## 3 0.01000000      2 0.01277372 0.01277372 0.002152498
## 
## Variable importance
## p.1 k.1   k s.1 s.2 p.3 
##  25  19  16  14  13  13 
## 
## Node number 1: 5686 observations,    complexity param=0.9671533
##   predicted class=e  expected loss=0.4818853  P(node) =1
##     class counts:  2946  2740
##    probabilities: 0.518 0.482 
##   left son=2 (3036 obs) right son=3 (2650 obs)
##   Primary splits:
##       p.1 splits as  LRRLRLRRR,    improve=2664.6040, (0 missing)
##       k.1 splits as  LRLLLRLRL,    improve=1533.8640, (0 missing)
##       k   splits as  RLRRLLLLRLLL, improve=1089.7760, (0 missing)
##       s.1 splits as  LRLL,         improve= 987.2070, (0 missing)
##       s.2 splits as  LRLL,         improve= 921.7207, (0 missing)
##   Surrogate splits:
##       k.1 splits as  LRLLLLLRL,    agree=0.861, adj=0.702, (0 split)
##       k   splits as  RLRRLLLLLLLL, agree=0.814, adj=0.601, (0 split)
##       s.1 splits as  LRLL,         agree=0.783, adj=0.535, (0 split)
##       s.2 splits as  LRLL,         agree=0.780, adj=0.529, (0 split)
##       p.3 splits as  RLRRL,        agree=0.774, adj=0.515, (0 split)
## 
## Node number 2: 3036 observations,    complexity param=0.02007299
##   predicted class=e  expected loss=0.02964427  P(node) =0.533943
##     class counts:  2946    90
##    probabilities: 0.970 0.030 
##   left son=4 (2981 obs) right son=5 (55 obs)
##   Primary splits:
##       k.1 splits as  LLLLLRLLL,    improve=105.485900, (0 missing)
##       k   splits as  -LLLLLLLRLLL, improve= 34.099420, (0 missing)
##       w.1 splits as  --LLLLLLR,    improve= 28.387960, (0 missing)
##       n   splits as  RLLLLRLLLL,   improve= 21.848150, (0 missing)
##       o   splits as  -LR,          improve=  9.771707, (0 missing)
##   Surrogate splits:
##       k splits as  -LLLLLLLRLLL, agree=0.988, adj=0.327, (0 split)
## 
## Node number 3: 2650 observations
##   predicted class=p  expected loss=0  P(node) =0.466057
##     class counts:     0  2650
##    probabilities: 0.000 1.000 
## 
## Node number 4: 2981 observations
##   predicted class=e  expected loss=0.01174103  P(node) =0.5242701
##     class counts:  2946    35
##    probabilities: 0.988 0.012 
## 
## Node number 5: 55 observations
##   predicted class=p  expected loss=0  P(node) =0.009672881
##     class counts:     0    55
##    probabilities: 0.000 1.000
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)
y <- predict(tree1,method="class",newdata=dataTE)

x <- (y[,1]>0.5) #we predicted some non-poisonous cases as poisonous
indices <- dataTE[[1]][x]=="e"
sum(x[indices])
## [1] 1258
sum(x[indices])/sum(indices)
## [1] 0.9968304
sum(x[indices])-sum(indices) #we predicted 4 non-poisonous cases as poisonous
## [1] -4
y <- predict(tree1,method="class",newdata=dataTE)
x <- (y[,1]<0.5)
indices <- dataTE[[1]][x]=="p" 
sum(x[indices])/sum(indices) #we predicted all poisonous cases correctly
## [1] 1

RF

rf.shroom=randomForest(p~.,data=dataTR,mtry=4,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.shroom)

varImpPlot(rf.shroom)

pred.shroom = predict(rf.shroom,newdata=dataTE) #we predicted all non-poisonous cases correctly as well
indices <- dataTE[[1]]=="e"
sum(indices)
## [1] 1262
table(pred.shroom,dataTE[[1]])
##            
## pred.shroom    e    p
##           e 1262    0
##           p    0 1175
rf.shroom=randomForest(p~.,data=dataTR,mtry=5,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.shroom)

varImpPlot(rf.shroom)

pred.shroom = predict(rf.shroom,newdata=dataTE) #we predicted all non-poisonous cases correctly as well
indices <- dataTE[[1]]=="e"
sum(indices)
## [1] 1262
table(pred.shroom,dataTE[[1]])
##            
## pred.shroom    e    p
##           e 1262    0
##           p    0 1175
rf.shroom=randomForest(p~.,data=dataTR,mtry=6,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.shroom)

varImpPlot(rf.shroom)

pred.shroom = predict(rf.shroom,newdata=dataTE) #we predicted all non-poisonous cases correctly as well
indices <- dataTE[[1]]=="e"
sum(indices)
## [1] 1262
table(pred.shroom,dataTE[[1]])
##            
## pred.shroom    e    p
##           e 1262    0
##           p    0 1175
rf.shroom=randomForest(p~.,data=dataTR,mtry=7,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.shroom)

varImpPlot(rf.shroom)

pred.shroom = predict(rf.shroom,newdata=dataTE) #we predicted all non-poisonous cases correctly as well
indices <- dataTE[[1]]=="e"
sum(indices)
## [1] 1262
table(pred.shroom,dataTE[[1]])
##            
## pred.shroom    e    p
##           e 1262    0
##           p    0 1175
rf.shroom=randomForest(p~.,data=dataTR,mtry=8,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.shroom)

varImpPlot(rf.shroom)

pred.shroom = predict(rf.shroom,newdata=dataTE) #we predicted all non-poisonous cases correctly as well
indices <- dataTE[[1]]=="e"
sum(indices)
## [1] 1262
table(pred.shroom,dataTE[[1]])
##            
## pred.shroom    e    p
##           e 1262    0
##           p    0 1175

####TAIWANESE DATA This data set is regarding predicting whether a company will go bankrupt or not. A feature is removed since all the entries are the same. Further, all the features are numeric.

data <- data.table(read.csv("taiwanese.csv",stringsAsFactors=T))
data <- data[,-95]
str(data)
## Classes 'data.table' and 'data.frame':   6819 obs. of  95 variables:
##  $ Bankrupt.                                              : int  1 1 1 1 1 1 0 0 0 0 ...
##  $ ROA.C..before.interest.and.depreciation.before.interest: num  0.371 0.464 0.426 0.4 0.465 ...
##  $ ROA.A..before.interest.and...after.tax                 : num  0.424 0.538 0.499 0.451 0.538 ...
##  $ ROA.B..before.interest.and.depreciation.after.tax      : num  0.406 0.517 0.472 0.458 0.522 ...
##  $ Operating.Gross.Margin                                 : num  0.601 0.61 0.601 0.584 0.599 ...
##  $ Realized.Sales.Gross.Margin                            : num  0.601 0.61 0.601 0.584 0.599 ...
##  $ Operating.Profit.Rate                                  : num  0.999 0.999 0.999 0.999 0.999 ...
##  $ Pre.tax.net.Interest.Rate                              : num  0.797 0.797 0.796 0.797 0.797 ...
##  $ After.tax.net.Interest.Rate                            : num  0.809 0.809 0.808 0.809 0.809 ...
##  $ Non.industry.income.and.expenditure.revenue            : num  0.303 0.304 0.302 0.303 0.303 ...
##  $ Continuous.interest.rate..after.tax.                   : num  0.781 0.782 0.78 0.781 0.782 ...
##  $ Operating.Expense.Rate                                 : num  1.26e-04 2.90e-04 2.36e-04 1.08e-04 7.89e+09 ...
##  $ Research.and.development.expense.rate                  : num  0.00 0.00 2.55e+07 0.00 0.00 0.00 7.30e+08 5.09e+07 0.00 0.00 ...
##  $ Cash.flow.rate                                         : num  0.458 0.462 0.459 0.466 0.463 ...
##  $ Interest.bearing.debt.interest.rate                    : num  0.000725 0.000647 0.00079 0.000449 0.000686 ...
##  $ Tax.rate..A.                                           : num  0 0 0 0 0 ...
##  $ Net.Value.Per.Share..B.                                : num  0.148 0.182 0.178 0.154 0.168 ...
##  $ Net.Value.Per.Share..A.                                : num  0.148 0.182 0.178 0.154 0.168 ...
##  $ Net.Value.Per.Share..C.                                : num  0.148 0.182 0.194 0.154 0.168 ...
##  $ Persistent.EPS.in.the.Last.Four.Seasons                : num  0.169 0.209 0.181 0.194 0.213 ...
##  $ Cash.Flow.Per.Share                                    : num  0.312 0.318 0.307 0.322 0.319 ...
##  $ Revenue.Per.Share..Yuan...                             : num  0.01756 0.02114 0.00594 0.01437 0.02969 ...
##  $ Operating.Profit.Per.Share..Yuan...                    : num  0.0959 0.0937 0.0923 0.0778 0.0969 ...
##  $ Per.Share.Net.profit.before.tax..Yuan...               : num  0.139 0.17 0.143 0.149 0.168 ...
##  $ Realized.Sales.Gross.Profit.Growth.Rate                : num  0.0221 0.0221 0.0228 0.022 0.0221 ...
##  $ Operating.Profit.Growth.Rate                           : num  0.848 0.848 0.848 0.848 0.848 ...
##  $ After.tax.Net.Profit.Growth.Rate                       : num  0.689 0.69 0.689 0.689 0.69 ...
##  $ Regular.Net.Profit.Growth.Rate                         : num  0.689 0.69 0.689 0.689 0.69 ...
##  $ Continuous.Net.Profit.Growth.Rate                      : num  0.218 0.218 0.218 0.218 0.218 ...
##  $ Total.Asset.Growth.Rate                                : num  4.98e+09 6.11e+09 7.28e+09 4.88e+09 5.51e+09 6.08e+08 5.72e+09 6.63e+09 6.89e+09 5.55e+09 ...
##  $ Net.Value.Growth.Rate                                  : num  0.000327 0.000443 0.000396 0.000382 0.000439 ...
##  $ Total.Asset.Return.Growth.Rate.Ratio                   : num  0.263 0.265 0.264 0.263 0.265 ...
##  $ Cash.Reinvestment..                                    : num  0.364 0.377 0.369 0.384 0.38 ...
##  $ Current.Ratio                                          : num  0.00226 0.00602 0.01154 0.00419 0.00602 ...
##  $ Quick.Ratio                                            : num  0.00121 0.00404 0.00535 0.0029 0.00373 ...
##  $ Interest.Expense.Ratio                                 : num  0.63 0.635 0.63 0.63 0.636 ...
##  $ Total.debt.Total.net.worth                             : num  0.02127 0.0125 0.02125 0.00957 0.00515 ...
##  $ Debt.ratio..                                           : num  0.208 0.171 0.208 0.151 0.107 ...
##  $ Net.worth.Assets                                       : num  0.792 0.829 0.792 0.849 0.893 ...
##  $ Long.term.fund.suitability.ratio..A.                   : num  0.00502 0.00506 0.0051 0.00505 0.0053 ...
##  $ Borrowing.dependency                                   : num  0.39 0.377 0.379 0.38 0.375 ...
##  $ Contingent.liabilities.Net.worth                       : num  0.00648 0.00584 0.00656 0.00537 0.00662 ...
##  $ Operating.profit.Paid.in.capital                       : num  0.0959 0.0937 0.0923 0.0777 0.0969 ...
##  $ Net.profit.before.tax.Paid.in.capital                  : num  0.138 0.169 0.148 0.148 0.167 ...
##  $ Inventory.and.accounts.receivable.Net.value            : num  0.398 0.398 0.407 0.398 0.4 ...
##  $ Total.Asset.Turnover                                   : num  0.087 0.0645 0.015 0.09 0.1754 ...
##  $ Accounts.Receivable.Turnover                           : num  0.00181 0.00129 0.0015 0.00197 0.00145 ...
##  $ Average.Collection.Days                                : num  0.00349 0.00492 0.00423 0.00321 0.00437 ...
##  $ Inventory.Turnover.Rate..times.                        : num  1.82e-04 9.36e+09 6.50e+07 7.13e+09 1.63e-04 ...
##  $ Fixed.Assets.Turnover.Frequency                        : num  1.17e-04 7.19e+08 2.65e+09 9.15e+09 2.94e-04 ...
##  $ Net.Worth.Turnover.Rate..times.                        : num  0.0329 0.0255 0.0134 0.0281 0.0402 ...
##  $ Revenue.per.person                                     : num  0.03416 0.00689 0.029 0.01546 0.05811 ...
##  $ Operating.profit.per.person                            : num  0.393 0.392 0.382 0.378 0.394 ...
##  $ Allocation.rate.per.person                             : num  0.0371 0.0123 0.141 0.0213 0.024 ...
##  $ Working.Capital.to.Total.Assets                        : num  0.673 0.751 0.83 0.726 0.752 ...
##  $ Quick.Assets.Total.Assets                              : num  0.167 0.127 0.34 0.162 0.26 ...
##  $ Current.Assets.Total.Assets                            : num  0.191 0.182 0.603 0.226 0.358 ...
##  $ Cash.Total.Assets                                      : num  0.004094 0.014948 0.000991 0.018851 0.014161 ...
##  $ Quick.Assets.Current.Liability                         : num  0.002 0.00414 0.0063 0.00296 0.00427 ...
##  $ Cash.Current.Liability                                 : num  1.47e-04 1.38e-03 5.34e+09 1.01e-03 6.80e-04 ...
##  $ Current.Liability.to.Assets                            : num  0.1473 0.057 0.0982 0.0987 0.1102 ...
##  $ Operating.Funds.to.Liability                           : num  0.334 0.341 0.337 0.349 0.345 ...
##  $ Inventory.Working.Capital                              : num  0.277 0.29 0.277 0.277 0.288 ...
##  $ Inventory.Current.Liability                            : num  0.00104 0.00521 0.01388 0.00354 0.00487 ...
##  $ Current.Liabilities.Liability                          : num  0.676 0.309 0.446 0.616 0.975 ...
##  $ Working.Capital.Equity                                 : num  0.721 0.732 0.743 0.73 0.732 ...
##  $ Current.Liabilities.Equity                             : num  0.339 0.33 0.335 0.332 0.331 ...
##  $ Long.term.Liability.to.Current.Assets                  : num  0.02559 0.02395 0.00372 0.02217 0 ...
##  $ Retained.Earnings.to.Total.Assets                      : num  0.903 0.931 0.91 0.907 0.914 ...
##  $ Total.income.Total.expense                             : num  0.00202 0.00223 0.00206 0.00183 0.00222 ...
##  $ Total.expense.Assets                                   : num  0.0649 0.0255 0.0214 0.0242 0.0264 ...
##  $ Current.Asset.Turnover.Rate                            : num  7.01e+08 1.07e-04 1.79e-03 8.14e+09 6.68e+09 ...
##  $ Quick.Asset.Turnover.Rate                              : num  6.55e+09 7.70e+09 1.02e-03 6.05e+09 5.05e+09 ...
##  $ Working.capitcal.Turnover.Rate                         : num  0.594 0.594 0.595 0.594 0.594 ...
##  $ Cash.Turnover.Rate                                     : num  4.58e+08 2.49e+09 7.61e+08 2.03e+09 8.24e+08 ...
##  $ Cash.Flow.to.Sales                                     : num  0.672 0.672 0.672 0.672 0.672 ...
##  $ Fixed.Assets.to.Assets                                 : num  0.424 0.469 0.276 0.559 0.31 ...
##  $ Current.Liability.to.Liability                         : num  0.676 0.309 0.446 0.616 0.975 ...
##  $ Current.Liability.to.Equity                            : num  0.339 0.33 0.335 0.332 0.331 ...
##  $ Equity.to.Long.term.Liability                          : num  0.127 0.121 0.118 0.121 0.111 ...
##  $ Cash.Flow.to.Total.Assets                              : num  0.638 0.641 0.643 0.579 0.622 ...
##  $ Cash.Flow.to.Liability                                 : num  0.459 0.459 0.459 0.449 0.454 ...
##  $ CFO.to.Assets                                          : num  0.52 0.567 0.538 0.604 0.578 ...
##  $ Cash.Flow.to.Equity                                    : num  0.313 0.314 0.315 0.302 0.312 ...
##  $ Current.Liability.to.Current.Assets                    : num  0.1183 0.0478 0.0253 0.0672 0.0477 ...
##  $ Liability.Assets.Flag                                  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Net.Income.to.Total.Assets                             : num  0.717 0.795 0.775 0.74 0.795 ...
##  $ Total.assets.to.GNP.price                              : num  0.00922 0.00832 0.04 0.00325 0.00388 ...
##  $ No.credit.Interval                                     : num  0.623 0.624 0.624 0.623 0.624 ...
##  $ Gross.Profit.to.Sales                                  : num  0.601 0.61 0.601 0.584 0.599 ...
##  $ Net.Income.to.Stockholder.s.Equity                     : num  0.828 0.84 0.837 0.835 0.84 ...
##  $ Liability.to.Equity                                    : num  0.29 0.284 0.29 0.282 0.279 ...
##  $ Degree.of.Financial.Leverage..DFL.                     : num  0.0266 0.2646 0.0266 0.0267 0.0248 ...
##  $ Interest.Coverage.Ratio..Interest.expense.to.EBIT.     : num  0.564 0.57 0.564 0.565 0.576 ...
##  $ Equity.to.Liability                                    : num  0.0165 0.0208 0.0165 0.024 0.0355 ...
##  - attr(*, ".internal.selfref")=<externalptr>
data$Bankrupt. <- as.factor(data$Bankrupt.)
set.seed(582) #using caTools performing stratified sampling
split=sample.split(data$Bankrupt., SplitRatio=0.7)
dataTR=subset(data,split==TRUE)
dataTE=subset(data,split==FALSE)
str(dataTR)
## Classes 'data.table' and 'data.frame':   4773 obs. of  95 variables:
##  $ Bankrupt.                                              : Factor w/ 2 levels "0","1": 2 2 2 2 2 1 1 1 1 1 ...
##  $ ROA.C..before.interest.and.depreciation.before.interest: num  0.464 0.426 0.4 0.465 0.389 ...
##  $ ROA.A..before.interest.and...after.tax                 : num  0.538 0.499 0.451 0.538 0.415 ...
##  $ ROA.B..before.interest.and.depreciation.after.tax      : num  0.517 0.472 0.458 0.522 0.419 ...
##  $ Operating.Gross.Margin                                 : num  0.61 0.601 0.584 0.599 0.59 ...
##  $ Realized.Sales.Gross.Margin                            : num  0.61 0.601 0.584 0.599 0.59 ...
##  $ Operating.Profit.Rate                                  : num  0.999 0.999 0.999 0.999 0.999 ...
##  $ Pre.tax.net.Interest.Rate                              : num  0.797 0.796 0.797 0.797 0.797 ...
##  $ After.tax.net.Interest.Rate                            : num  0.809 0.808 0.809 0.809 0.809 ...
##  $ Non.industry.income.and.expenditure.revenue            : num  0.304 0.302 0.303 0.303 0.303 ...
##  $ Continuous.interest.rate..after.tax.                   : num  0.782 0.78 0.781 0.782 0.781 ...
##  $ Operating.Expense.Rate                                 : num  2.90e-04 2.36e-04 1.08e-04 7.89e+09 1.57e-04 ...
##  $ Research.and.development.expense.rate                  : num  0.00 2.55e+07 0.00 0.00 0.00 7.30e+08 5.09e+07 0.00 0.00 1.21e+09 ...
##  $ Cash.flow.rate                                         : num  0.462 0.459 0.466 0.463 0.466 ...
##  $ Interest.bearing.debt.interest.rate                    : num  0.000647 0.00079 0.000449 0.000686 0.000716 ...
##  $ Tax.rate..A.                                           : num  0 0 0 0 0 ...
##  $ Net.Value.Per.Share..B.                                : num  0.182 0.178 0.154 0.168 0.156 ...
##  $ Net.Value.Per.Share..A.                                : num  0.182 0.178 0.154 0.168 0.156 ...
##  $ Net.Value.Per.Share..C.                                : num  0.182 0.194 0.154 0.168 0.156 ...
##  $ Persistent.EPS.in.the.Last.Four.Seasons                : num  0.209 0.181 0.194 0.213 0.174 ...
##  $ Cash.Flow.Per.Share                                    : num  0.318 0.307 0.322 0.319 0.325 ...
##  $ Revenue.Per.Share..Yuan...                             : num  0.02114 0.00594 0.01437 0.02969 0.0181 ...
##  $ Operating.Profit.Per.Share..Yuan...                    : num  0.0937 0.0923 0.0778 0.0969 0.0781 ...
##  $ Per.Share.Net.profit.before.tax..Yuan...               : num  0.17 0.143 0.149 0.168 0.139 ...
##  $ Realized.Sales.Gross.Profit.Growth.Rate                : num  0.0221 0.0228 0.022 0.0221 0.0216 ...
##  $ Operating.Profit.Growth.Rate                           : num  0.848 0.848 0.848 0.848 0.848 ...
##  $ After.tax.Net.Profit.Growth.Rate                       : num  0.69 0.689 0.689 0.69 0.689 ...
##  $ Regular.Net.Profit.Growth.Rate                         : num  0.69 0.689 0.689 0.69 0.689 ...
##  $ Continuous.Net.Profit.Growth.Rate                      : num  0.218 0.218 0.218 0.218 0.218 ...
##  $ Total.Asset.Growth.Rate                                : num  6.11e+09 7.28e+09 4.88e+09 5.51e+09 6.08e+08 5.72e+09 6.63e+09 6.89e+09 5.55e+09 5.73e+09 ...
##  $ Net.Value.Growth.Rate                                  : num  0.000443 0.000396 0.000382 0.000439 0.000352 ...
##  $ Total.Asset.Return.Growth.Rate.Ratio                   : num  0.265 0.264 0.263 0.265 0.263 ...
##  $ Cash.Reinvestment..                                    : num  0.377 0.369 0.384 0.38 0.388 ...
##  $ Current.Ratio                                          : num  0.00602 0.01154 0.00419 0.00602 0.00274 ...
##  $ Quick.Ratio                                            : num  0.004039 0.005348 0.002896 0.003727 0.000855 ...
##  $ Interest.Expense.Ratio                                 : num  0.635 0.63 0.63 0.636 0.63 ...
##  $ Total.debt.Total.net.worth                             : num  0.0125 0.02125 0.00957 0.00515 0.01421 ...
##  $ Debt.ratio..                                           : num  0.171 0.208 0.151 0.107 0.18 ...
##  $ Net.worth.Assets                                       : num  0.829 0.792 0.849 0.893 0.82 ...
##  $ Long.term.fund.suitability.ratio..A.                   : num  0.00506 0.0051 0.00505 0.0053 0.00491 ...
##  $ Borrowing.dependency                                   : num  0.377 0.379 0.38 0.375 0.381 ...
##  $ Contingent.liabilities.Net.worth                       : num  0.00584 0.00656 0.00537 0.00662 0.00575 ...
##  $ Operating.profit.Paid.in.capital                       : num  0.0937 0.0923 0.0777 0.0969 0.0781 ...
##  $ Net.profit.before.tax.Paid.in.capital                  : num  0.169 0.148 0.148 0.167 0.138 ...
##  $ Inventory.and.accounts.receivable.Net.value            : num  0.398 0.407 0.398 0.4 0.4 ...
##  $ Total.Asset.Turnover                                   : num  0.0645 0.015 0.09 0.1754 0.096 ...
##  $ Accounts.Receivable.Turnover                           : num  0.00129 0.0015 0.00197 0.00145 0.00153 ...
##  $ Average.Collection.Days                                : num  0.00492 0.00423 0.00321 0.00437 0.00414 ...
##  $ Inventory.Turnover.Rate..times.                        : num  9.36e+09 6.50e+07 7.13e+09 1.63e-04 6.50e+08 ...
##  $ Fixed.Assets.Turnover.Frequency                        : num  7.19e+08 2.65e+09 9.15e+09 2.94e-04 9.30e+09 ...
##  $ Net.Worth.Turnover.Rate..times.                        : num  0.0255 0.0134 0.0281 0.0402 0.0297 ...
##  $ Revenue.per.person                                     : num  0.00689 0.029 0.01546 0.05811 0.0213 ...
##  $ Operating.profit.per.person                            : num  0.392 0.382 0.378 0.394 0.378 ...
##  $ Allocation.rate.per.person                             : num  0.0123 0.141 0.0213 0.024 0.0328 ...
##  $ Working.Capital.to.Total.Assets                        : num  0.751 0.83 0.726 0.752 0.687 ...
##  $ Quick.Assets.Total.Assets                              : num  0.1272 0.3402 0.1616 0.2603 0.0803 ...
##  $ Current.Assets.Total.Assets                            : num  0.182 0.603 0.226 0.358 0.215 ...
##  $ Cash.Total.Assets                                      : num  0.014948 0.000991 0.018851 0.014161 0.002645 ...
##  $ Quick.Assets.Current.Liability                         : num  0.004136 0.006302 0.002961 0.004275 0.000988 ...
##  $ Cash.Current.Liability                                 : num  1.38e-03 5.34e+09 1.01e-03 6.80e-04 1.01e-04 ...
##  $ Current.Liability.to.Assets                            : num  0.057 0.0982 0.0987 0.1102 0.139 ...
##  $ Operating.Funds.to.Liability                           : num  0.341 0.337 0.349 0.345 0.351 ...
##  $ Inventory.Working.Capital                              : num  0.29 0.277 0.277 0.288 0.277 ...
##  $ Inventory.Current.Liability                            : num  0.00521 0.01388 0.00354 0.00487 0.00488 ...
##  $ Current.Liabilities.Liability                          : num  0.309 0.446 0.616 0.975 0.733 ...
##  $ Working.Capital.Equity                                 : num  0.732 0.743 0.73 0.732 0.725 ...
##  $ Current.Liabilities.Equity                             : num  0.33 0.335 0.332 0.331 0.336 ...
##  $ Long.term.Liability.to.Current.Assets                  : num  0.02395 0.00372 0.02217 0 0.00377 ...
##  $ Retained.Earnings.to.Total.Assets                      : num  0.931 0.91 0.907 0.914 0.903 ...
##  $ Total.income.Total.expense                             : num  0.00223 0.00206 0.00183 0.00222 0.00187 ...
##  $ Total.expense.Assets                                   : num  0.0255 0.0214 0.0242 0.0264 0.0401 ...
##  $ Current.Asset.Turnover.Rate                            : num  1.07e-04 1.79e-03 8.14e+09 6.68e+09 8.01e+09 ...
##  $ Quick.Asset.Turnover.Rate                              : num  7.70e+09 1.02e-03 6.05e+09 5.05e+09 2.81e+09 ...
##  $ Working.capitcal.Turnover.Rate                         : num  0.594 0.595 0.594 0.594 0.594 ...
##  $ Cash.Turnover.Rate                                     : num  2.49e+09 7.61e+08 2.03e+09 8.24e+08 2.95e+08 ...
##  $ Cash.Flow.to.Sales                                     : num  0.672 0.672 0.672 0.672 0.672 ...
##  $ Fixed.Assets.to.Assets                                 : num  0.469 0.276 0.559 0.31 0.603 ...
##  $ Current.Liability.to.Liability                         : num  0.309 0.446 0.616 0.975 0.733 ...
##  $ Current.Liability.to.Equity                            : num  0.33 0.335 0.332 0.331 0.336 ...
##  $ Equity.to.Long.term.Liability                          : num  0.121 0.118 0.121 0.111 0.113 ...
##  $ Cash.Flow.to.Total.Assets                              : num  0.641 0.643 0.579 0.622 0.637 ...
##  $ Cash.Flow.to.Liability                                 : num  0.459 0.459 0.449 0.454 0.458 ...
##  $ CFO.to.Assets                                          : num  0.567 0.538 0.604 0.578 0.622 ...
##  $ Cash.Flow.to.Equity                                    : num  0.314 0.315 0.302 0.312 0.313 ...
##  $ Current.Liability.to.Current.Assets                    : num  0.0478 0.0253 0.0672 0.0477 0.0995 ...
##  $ Liability.Assets.Flag                                  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Net.Income.to.Total.Assets                             : num  0.795 0.775 0.74 0.795 0.71 ...
##  $ Total.assets.to.GNP.price                              : num  0.00832 0.04 0.00325 0.00388 0.00528 ...
##  $ No.credit.Interval                                     : num  0.624 0.624 0.623 0.624 0.623 ...
##  $ Gross.Profit.to.Sales                                  : num  0.61 0.601 0.584 0.599 0.59 ...
##  $ Net.Income.to.Stockholder.s.Equity                     : num  0.84 0.837 0.835 0.84 0.83 ...
##  $ Liability.to.Equity                                    : num  0.284 0.29 0.282 0.279 0.285 ...
##  $ Degree.of.Financial.Leverage..DFL.                     : num  0.2646 0.0266 0.0267 0.0248 0.0267 ...
##  $ Interest.Coverage.Ratio..Interest.expense.to.EBIT.     : num  0.57 0.564 0.565 0.576 0.565 ...
##  $ Equity.to.Liability                                    : num  0.0208 0.0165 0.024 0.0355 0.0195 ...
##  - attr(*, ".internal.selfref")=<externalptr>
knnFit <- train(Bankrupt.~ ., data = dataTR, method = "knn", trControl = trainControl(method = "cv"),preProcess = c("center","scale"), tuneGrid = expand.grid(k=c(3,5,7,9,11)))
knnFit
## k-Nearest Neighbors 
## 
## 4773 samples
##   94 predictor
##    2 classes: '0', '1' 
## 
## Pre-processing: centered (94), scaled (94) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 4295, 4295, 4296, 4295, 4297, 4296, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    3  0.9660605  0.2487153
##    5  0.9673193  0.2212166
##    7  0.9666894  0.1381143
##    9  0.9675276  0.1047323
##   11  0.9675276  0.1140000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 11.
y <- predict(knnFit,newdata=dataTE)
table(y,dataTE$Bankrupt.)
##    
## y      0    1
##   0 1978   58
##   1    2    8
tree1=rpart(dataTR$Bankrupt.~.,method="class",data=dataTR,maxdepth=4,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Bankrupt. ~ ., data = dataTR, method = "class", 
##     maxdepth = 4, xval = 10)
##   n= 4773 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.04329004      0 1.0000000 1.0000000 0.07927165
## 2 0.01731602      4 0.8181818 0.9805195 0.07852121
## 3 0.01000000      7 0.7662338 1.0064935 0.07952000
## 
## Variable importance
##                              Net.Value.Growth.Rate 
##                                                 14 
##                 Net.Income.to.Stockholder.s.Equity 
##                                                 10 
##           Per.Share.Net.profit.before.tax..Yuan... 
##                                                  7 
##              Net.profit.before.tax.Paid.in.capital 
##                                                  7 
##            Persistent.EPS.in.the.Last.Four.Seasons 
##                                                  6 
##                         Net.Income.to.Total.Assets 
##                                                  5 
##               Continuous.interest.rate..after.tax. 
##                                                  4 
##                               Borrowing.dependency 
##                                                  4 
##        Non.industry.income.and.expenditure.revenue 
##                                                  3 
##                        After.tax.net.Interest.Rate 
##                                                  3 
##                          Pre.tax.net.Interest.Rate 
##                                                  3 
##                                        Quick.Ratio 
##                                                  3 
##                Interest.bearing.debt.interest.rate 
##                                                  2 
##                                 Revenue.per.person 
##                                                  2 
##                     Quick.Assets.Current.Liability 
##                                                  2 
##                Current.Liability.to.Current.Assets 
##                                                  2 
##                                      Current.Ratio 
##                                                  2 
##                     Working.capitcal.Turnover.Rate 
##                                                  2 
##                         Revenue.Per.Share..Yuan... 
##                                                  2 
##                    Working.Capital.to.Total.Assets 
##                                                  1 
##                 Degree.of.Financial.Leverage..DFL. 
##                                                  1 
##                                Liability.to.Equity 
##                                                  1 
##                                       Debt.ratio.. 
##                                                  1 
##                                Equity.to.Liability 
##                                                  1 
##                                   Net.worth.Assets 
##                                                  1 
##                               Total.Asset.Turnover 
##                                                  1 
##                         Total.debt.Total.net.worth 
##                                                  1 
##                         Total.income.Total.expense 
##                                                  1 
## Interest.Coverage.Ratio..Interest.expense.to.EBIT. 
##                                                  1 
##                    Net.Worth.Turnover.Rate..times. 
##                                                  1 
##                             Cash.Current.Liability 
##                                                  1 
##                             Interest.Expense.Ratio 
##                                                  1 
##                                  Cash.Total.Assets 
##                                                  1 
##                              Operating.Profit.Rate 
##                                                  1 
## 
## Node number 1: 4773 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.03226482  P(node) =1
##     class counts:  4619   154
##    probabilities: 0.968 0.032 
##   left son=2 (4637 obs) right son=3 (136 obs)
##   Primary splits:
##       Net.Value.Growth.Rate                    < 0.0003670415 to the right, improve=41.90007, (0 missing)
##       Net.Income.to.Stockholder.s.Equity       < 0.8344808    to the right, improve=38.84998, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1807696    to the right, improve=38.56514, (0 missing)
##       Net.profit.before.tax.Paid.in.capital    < 0.1406304    to the right, improve=36.02657, (0 missing)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1486782    to the right, improve=33.70200, (0 missing)
##   Surrogate splits:
##       Net.Income.to.Stockholder.s.Equity       < 0.8334933    to the right, agree=0.992, adj=0.706, (0 split)
##       Net.profit.before.tax.Paid.in.capital    < 0.1385529    to the right, agree=0.985, adj=0.463, (0 split)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1769405    to the right, agree=0.984, adj=0.449, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1361       to the right, agree=0.983, adj=0.404, (0 split)
##       Net.Income.to.Total.Assets               < 0.7094973    to the right, agree=0.981, adj=0.331, (0 split)
## 
## Node number 2: 4637 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.0209187  P(node) =0.9715064
##     class counts:  4540    97
##    probabilities: 0.979 0.021 
##   left son=4 (4359 obs) right son=5 (278 obs)
##   Primary splits:
##       Borrowing.dependency                    < 0.3826258    to the left,  improve=11.158660, (0 missing)
##       Working.Capital.Equity                  < 0.7274442    to the right, improve= 9.783354, (0 missing)
##       Degree.of.Financial.Leverage..DFL.      < 0.02669723   to the right, improve= 9.749996, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.2012385    to the right, improve= 8.927490, (0 missing)
##       Interest.Expense.Ratio                  < 0.6303504    to the right, improve= 8.555154, (0 missing)
##   Surrogate splits:
##       Liability.to.Equity        < 0.2869906    to the left,  agree=0.960, adj=0.338, (0 split)
##       Debt.ratio..               < 0.1922484    to the left,  agree=0.960, adj=0.335, (0 split)
##       Net.worth.Assets           < 0.8077516    to the right, agree=0.960, adj=0.335, (0 split)
##       Equity.to.Liability        < 0.01809593   to the right, agree=0.960, adj=0.335, (0 split)
##       Total.debt.Total.net.worth < 0.01683811   to the left,  agree=0.959, adj=0.317, (0 split)
## 
## Node number 3: 136 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.4191176  P(node) =0.02849361
##     class counts:    79    57
##    probabilities: 0.581 0.419 
##   left son=6 (38 obs) right son=7 (98 obs)
##   Primary splits:
##       Quick.Ratio                           < 0.005814193  to the right, improve=8.720051, (0 missing)
##       Quick.Assets.Current.Liability        < 0.003547663  to the right, improve=8.070588, (0 missing)
##       Net.profit.before.tax.Paid.in.capital < 0.1066247    to the right, improve=7.902999, (0 missing)
##       Working.capitcal.Turnover.Rate        < 0.5939439    to the right, improve=7.695608, (0 missing)
##       Cash.Total.Assets                     < 0.00943074   to the right, improve=7.537255, (0 missing)
##   Surrogate splits:
##       Quick.Assets.Current.Liability      < 0.005713739  to the right, agree=0.941, adj=0.789, (0 split)
##       Current.Ratio                       < 0.009712606  to the right, agree=0.890, adj=0.605, (0 split)
##       Working.capitcal.Turnover.Rate      < 0.5939439    to the right, agree=0.890, adj=0.605, (0 split)
##       Current.Liability.to.Current.Assets < 0.0300781    to the left,  agree=0.890, adj=0.605, (0 split)
##       Working.Capital.to.Total.Assets     < 0.7725806    to the right, agree=0.860, adj=0.500, (0 split)
## 
## Node number 4: 4359 observations
##   predicted class=0  expected loss=0.01215875  P(node) =0.9132621
##     class counts:  4306    53
##    probabilities: 0.988 0.012 
## 
## Node number 5: 278 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.1582734  P(node) =0.05824429
##     class counts:   234    44
##    probabilities: 0.842 0.158 
##   left son=10 (197 obs) right son=11 (81 obs)
##   Primary splits:
##       Non.industry.income.and.expenditure.revenue        < 0.303409     to the right, improve=9.121639, (0 missing)
##       Continuous.interest.rate..after.tax.               < 0.7815287    to the right, improve=8.355707, (0 missing)
##       Per.Share.Net.profit.before.tax..Yuan...           < 0.1720268    to the right, improve=8.280004, (0 missing)
##       After.tax.net.Interest.Rate                        < 0.8092315    to the right, improve=8.244631, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5606346    to the right, improve=8.196150, (0 missing)
##   Surrogate splits:
##       After.tax.net.Interest.Rate              < 0.8092007    to the right, agree=0.831, adj=0.420, (0 split)
##       Pre.tax.net.Interest.Rate                < 0.7973282    to the right, agree=0.827, adj=0.407, (0 split)
##       Continuous.interest.rate..after.tax.     < 0.7814471    to the right, agree=0.817, adj=0.370, (0 split)
##       Total.income.Total.expense               < 0.002123953  to the right, agree=0.802, adj=0.321, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1628756    to the right, agree=0.791, adj=0.284, (0 split)
## 
## Node number 6: 38 observations
##   predicted class=0  expected loss=0.1315789  P(node) =0.00796145
##     class counts:    33     5
##    probabilities: 0.868 0.132 
## 
## Node number 7: 98 observations,    complexity param=0.04329004
##   predicted class=1  expected loss=0.4693878  P(node) =0.02053216
##     class counts:    46    52
##    probabilities: 0.469 0.531 
##   left son=14 (22 obs) right son=15 (76 obs)
##   Primary splits:
##       Revenue.per.person                      < 0.00693859   to the left,  improve=6.902451, (0 missing)
##       Operating.profit.per.person             < 0.371379     to the right, improve=6.275150, (0 missing)
##       Realized.Sales.Gross.Profit.Growth.Rate < 0.02222639   to the right, improve=5.580371, (0 missing)
##       Net.Value.Growth.Rate                   < 0.0002560444 to the right, improve=5.173469, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.1805332    to the right, improve=5.153856, (0 missing)
##   Surrogate splits:
##       Operating.Profit.Rate                       < 0.9982432    to the left,  agree=0.837, adj=0.273, (0 split)
##       Revenue.Per.Share..Yuan...                  < 0.006692681  to the left,  agree=0.816, adj=0.182, (0 split)
##       Non.industry.income.and.expenditure.revenue < 0.303533     to the right, agree=0.806, adj=0.136, (0 split)
##       Continuous.interest.rate..after.tax.        < 0.7800147    to the left,  agree=0.806, adj=0.136, (0 split)
##       Accounts.Receivable.Turnover                < 0.0004027472 to the left,  agree=0.806, adj=0.136, (0 split)
## 
## Node number 10: 197 observations
##   predicted class=0  expected loss=0.07614213  P(node) =0.04127383
##     class counts:   182    15
##    probabilities: 0.924 0.076 
## 
## Node number 11: 81 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.3580247  P(node) =0.01697046
##     class counts:    52    29
##    probabilities: 0.642 0.358 
##   left son=22 (51 obs) right son=23 (30 obs)
##   Primary splits:
##       Interest.bearing.debt.interest.rate                < 0.0006460646 to the left,  improve=7.222803, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5620761    to the right, improve=6.365003, (0 missing)
##       Interest.Expense.Ratio                             < 0.6280977    to the right, improve=5.206994, (0 missing)
##       Operating.Expense.Rate                             < 4.585e+09    to the left,  improve=4.679868, (0 missing)
##       Revenue.Per.Share..Yuan...                         < 0.04138875   to the left,  improve=4.457103, (0 missing)
##   Surrogate splits:
##       Degree.of.Financial.Leverage..DFL.                 < 0.02643784   to the right, agree=0.840, adj=0.567, (0 split)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5626205    to the right, agree=0.778, adj=0.400, (0 split)
##       Interest.Expense.Ratio                             < 0.6280977    to the right, agree=0.741, adj=0.300, (0 split)
##       Cash.Current.Liability                             < 0.0002664469 to the right, agree=0.741, adj=0.300, (0 split)
##       Cash.Total.Assets                                  < 0.01247115   to the right, agree=0.728, adj=0.267, (0 split)
## 
## Node number 14: 22 observations
##   predicted class=0  expected loss=0.1818182  P(node) =0.00460926
##     class counts:    18     4
##    probabilities: 0.818 0.182 
## 
## Node number 15: 76 observations,    complexity param=0.04329004
##   predicted class=1  expected loss=0.3684211  P(node) =0.0159229
##     class counts:    28    48
##    probabilities: 0.368 0.632 
##   left son=30 (8 obs) right son=31 (68 obs)
##   Primary splits:
##       Continuous.interest.rate..after.tax.    < 0.7814384    to the right, improve=7.133127, (0 missing)
##       Fixed.Assets.Turnover.Frequency         < 0.001012572  to the right, improve=5.035320, (0 missing)
##       Total.Asset.Return.Growth.Rate.Ratio    < 0.2615727    to the right, improve=4.658744, (0 missing)
##       Net.Value.Growth.Rate                   < 0.0002560444 to the right, improve=4.257310, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.1793042    to the right, improve=3.995872, (0 missing)
##   Surrogate splits:
##       Pre.tax.net.Interest.Rate       < 0.7972556    to the right, agree=0.974, adj=0.750, (0 split)
##       After.tax.net.Interest.Rate     < 0.8091983    to the right, agree=0.974, adj=0.750, (0 split)
##       Revenue.Per.Share..Yuan...      < 0.08244476   to the right, agree=0.947, adj=0.500, (0 split)
##       Total.Asset.Turnover            < 0.3718141    to the right, agree=0.947, adj=0.500, (0 split)
##       Net.Worth.Turnover.Rate..times. < 0.1894355    to the right, agree=0.934, adj=0.375, (0 split)
## 
## Node number 22: 51 observations
##   predicted class=0  expected loss=0.1960784  P(node) =0.0106851
##     class counts:    41    10
##    probabilities: 0.804 0.196 
## 
## Node number 23: 30 observations
##   predicted class=1  expected loss=0.3666667  P(node) =0.006285355
##     class counts:    11    19
##    probabilities: 0.367 0.633 
## 
## Node number 30: 8 observations
##   predicted class=0  expected loss=0  P(node) =0.001676095
##     class counts:     8     0
##    probabilities: 1.000 0.000 
## 
## Node number 31: 68 observations
##   predicted class=1  expected loss=0.2941176  P(node) =0.0142468
##     class counts:    20    48
##    probabilities: 0.294 0.706
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$Bankrupt.))
y1[y[,1]>0.5] <- 1
y1[-(y[,1]>0.5)] <- 0
table(as.factor(y1),dataTE$Bankrupt.)
##    
##        0    1
##   0 1980   66
length(y1)
## [1] 2046
tree1=rpart(dataTR$Bankrupt.~.,method="class",data=dataTR,maxdepth=3,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Bankrupt. ~ ., data = dataTR, method = "class", 
##     maxdepth = 3, xval = 10)
##   n= 4773 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.04329004      0 1.0000000 1.0000000 0.07927165
## 2 0.01000000      3 0.8701299 0.9935065 0.07902241
## 
## Variable importance
##                    Net.Value.Growth.Rate 
##                                       22 
##       Net.Income.to.Stockholder.s.Equity 
##                                       16 
##    Net.profit.before.tax.Paid.in.capital 
##                                       10 
##  Persistent.EPS.in.the.Last.Four.Seasons 
##                                       10 
## Per.Share.Net.profit.before.tax..Yuan... 
##                                        9 
##               Net.Income.to.Total.Assets 
##                                        7 
##                              Quick.Ratio 
##                                        5 
##                       Revenue.per.person 
##                                        4 
##           Quick.Assets.Current.Liability 
##                                        4 
##      Current.Liability.to.Current.Assets 
##                                        3 
##                            Current.Ratio 
##                                        3 
##           Working.capitcal.Turnover.Rate 
##                                        3 
##          Working.Capital.to.Total.Assets 
##                                        2 
##                    Operating.Profit.Rate 
##                                        1 
##               Revenue.Per.Share..Yuan... 
##                                        1 
## 
## Node number 1: 4773 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.03226482  P(node) =1
##     class counts:  4619   154
##    probabilities: 0.968 0.032 
##   left son=2 (4637 obs) right son=3 (136 obs)
##   Primary splits:
##       Net.Value.Growth.Rate                    < 0.0003670415 to the right, improve=41.90007, (0 missing)
##       Net.Income.to.Stockholder.s.Equity       < 0.8344808    to the right, improve=38.84998, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1807696    to the right, improve=38.56514, (0 missing)
##       Net.profit.before.tax.Paid.in.capital    < 0.1406304    to the right, improve=36.02657, (0 missing)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1486782    to the right, improve=33.70200, (0 missing)
##   Surrogate splits:
##       Net.Income.to.Stockholder.s.Equity       < 0.8334933    to the right, agree=0.992, adj=0.706, (0 split)
##       Net.profit.before.tax.Paid.in.capital    < 0.1385529    to the right, agree=0.985, adj=0.463, (0 split)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1769405    to the right, agree=0.984, adj=0.449, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1361       to the right, agree=0.983, adj=0.404, (0 split)
##       Net.Income.to.Total.Assets               < 0.7094973    to the right, agree=0.981, adj=0.331, (0 split)
## 
## Node number 2: 4637 observations
##   predicted class=0  expected loss=0.0209187  P(node) =0.9715064
##     class counts:  4540    97
##    probabilities: 0.979 0.021 
## 
## Node number 3: 136 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.4191176  P(node) =0.02849361
##     class counts:    79    57
##    probabilities: 0.581 0.419 
##   left son=6 (38 obs) right son=7 (98 obs)
##   Primary splits:
##       Quick.Ratio                           < 0.005814193  to the right, improve=8.720051, (0 missing)
##       Quick.Assets.Current.Liability        < 0.003547663  to the right, improve=8.070588, (0 missing)
##       Net.profit.before.tax.Paid.in.capital < 0.1066247    to the right, improve=7.902999, (0 missing)
##       Working.capitcal.Turnover.Rate        < 0.5939439    to the right, improve=7.695608, (0 missing)
##       Cash.Total.Assets                     < 0.00943074   to the right, improve=7.537255, (0 missing)
##   Surrogate splits:
##       Quick.Assets.Current.Liability      < 0.005713739  to the right, agree=0.941, adj=0.789, (0 split)
##       Current.Ratio                       < 0.009712606  to the right, agree=0.890, adj=0.605, (0 split)
##       Working.capitcal.Turnover.Rate      < 0.5939439    to the right, agree=0.890, adj=0.605, (0 split)
##       Current.Liability.to.Current.Assets < 0.0300781    to the left,  agree=0.890, adj=0.605, (0 split)
##       Working.Capital.to.Total.Assets     < 0.7725806    to the right, agree=0.860, adj=0.500, (0 split)
## 
## Node number 6: 38 observations
##   predicted class=0  expected loss=0.1315789  P(node) =0.00796145
##     class counts:    33     5
##    probabilities: 0.868 0.132 
## 
## Node number 7: 98 observations,    complexity param=0.04329004
##   predicted class=1  expected loss=0.4693878  P(node) =0.02053216
##     class counts:    46    52
##    probabilities: 0.469 0.531 
##   left son=14 (22 obs) right son=15 (76 obs)
##   Primary splits:
##       Revenue.per.person                      < 0.00693859   to the left,  improve=6.902451, (0 missing)
##       Operating.profit.per.person             < 0.371379     to the right, improve=6.275150, (0 missing)
##       Realized.Sales.Gross.Profit.Growth.Rate < 0.02222639   to the right, improve=5.580371, (0 missing)
##       Net.Value.Growth.Rate                   < 0.0002560444 to the right, improve=5.173469, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.1805332    to the right, improve=5.153856, (0 missing)
##   Surrogate splits:
##       Operating.Profit.Rate                       < 0.9982432    to the left,  agree=0.837, adj=0.273, (0 split)
##       Revenue.Per.Share..Yuan...                  < 0.006692681  to the left,  agree=0.816, adj=0.182, (0 split)
##       Non.industry.income.and.expenditure.revenue < 0.303533     to the right, agree=0.806, adj=0.136, (0 split)
##       Continuous.interest.rate..after.tax.        < 0.7800147    to the left,  agree=0.806, adj=0.136, (0 split)
##       Accounts.Receivable.Turnover                < 0.0004027472 to the left,  agree=0.806, adj=0.136, (0 split)
## 
## Node number 14: 22 observations
##   predicted class=0  expected loss=0.1818182  P(node) =0.00460926
##     class counts:    18     4
##    probabilities: 0.818 0.182 
## 
## Node number 15: 76 observations
##   predicted class=1  expected loss=0.3684211  P(node) =0.0159229
##     class counts:    28    48
##    probabilities: 0.368 0.632
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$Bankrupt.))
y1[y[,1]>0.5] <- 1
y1[-(y[,1]>0.5)] <- 0
table(as.factor(y1),dataTE$Bankrupt.)
##    
##        0    1
##   0 1980   66
length(y1)
## [1] 2046
tree1=rpart(dataTR$Bankrupt.~.,method="class",data=dataTR,maxdepth=5,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Bankrupt. ~ ., data = dataTR, method = "class", 
##     maxdepth = 5, xval = 10)
##   n= 4773 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.04329004      0 1.0000000 1.0000000 0.07927165
## 2 0.01731602      4 0.8181818 0.9675325 0.07801635
## 3 0.01298701      7 0.7662338 1.0519481 0.08123415
## 4 0.01082251      8 0.7532468 1.1168831 0.08361302
## 5 0.01000000     11 0.7207792 1.1168831 0.08361302
## 
## Variable importance
##                              Net.Value.Growth.Rate 
##                                                 12 
##                 Net.Income.to.Stockholder.s.Equity 
##                                                  9 
##            Persistent.EPS.in.the.Last.Four.Seasons 
##                                                  6 
##           Per.Share.Net.profit.before.tax..Yuan... 
##                                                  6 
##              Net.profit.before.tax.Paid.in.capital 
##                                                  6 
##                         Net.Income.to.Total.Assets 
##                                                  5 
##                Interest.bearing.debt.interest.rate 
##                                                  3 
##               Continuous.interest.rate..after.tax. 
##                                                  3 
##                               Borrowing.dependency 
##                                                  3 
##        Non.industry.income.and.expenditure.revenue 
##                                                  3 
##                                 Revenue.per.person 
##                                                  3 
##                        After.tax.net.Interest.Rate 
##                                                  3 
##                 Degree.of.Financial.Leverage..DFL. 
##                                                  3 
##                          Pre.tax.net.Interest.Rate 
##                                                  3 
##                                        Quick.Ratio 
##                                                  2 
##                             Interest.Expense.Ratio 
##                                                  2 
## Interest.Coverage.Ratio..Interest.expense.to.EBIT. 
##                                                  2 
##                     Quick.Assets.Current.Liability 
##                                                  2 
##                Current.Liability.to.Current.Assets 
##                                                  1 
##                                      Current.Ratio 
##                                                  1 
##                     Working.capitcal.Turnover.Rate 
##                                                  1 
##                                Liability.to.Equity 
##                                                  1 
##                                       Debt.ratio.. 
##                                                  1 
##                                Equity.to.Liability 
##                                                  1 
##                                   Net.worth.Assets 
##                                                  1 
##                         Total.debt.Total.net.worth 
##                                                  1 
##                         Revenue.Per.Share..Yuan... 
##                                                  1 
##                    Working.Capital.to.Total.Assets 
##                                                  1 
##              Long.term.Liability.to.Current.Assets 
##                                                  1 
##                               Total.Asset.Turnover 
##                                                  1 
##                      Equity.to.Long.term.Liability 
##                                                  1 
##                             Cash.Current.Liability 
##                                                  1 
##                         Total.income.Total.expense 
##                                                  1 
##                      Current.Liabilities.Liability 
##                                                  1 
##                        Current.Liability.to.Assets 
##                                                  1 
##                     Current.Liability.to.Liability 
##                                                  1 
##                    Net.Worth.Turnover.Rate..times. 
##                                                  1 
##                                  Cash.Total.Assets 
##                                                  1 
##                              Operating.Profit.Rate 
##                                                  1 
## 
## Node number 1: 4773 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.03226482  P(node) =1
##     class counts:  4619   154
##    probabilities: 0.968 0.032 
##   left son=2 (4637 obs) right son=3 (136 obs)
##   Primary splits:
##       Net.Value.Growth.Rate                    < 0.0003670415 to the right, improve=41.90007, (0 missing)
##       Net.Income.to.Stockholder.s.Equity       < 0.8344808    to the right, improve=38.84998, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1807696    to the right, improve=38.56514, (0 missing)
##       Net.profit.before.tax.Paid.in.capital    < 0.1406304    to the right, improve=36.02657, (0 missing)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1486782    to the right, improve=33.70200, (0 missing)
##   Surrogate splits:
##       Net.Income.to.Stockholder.s.Equity       < 0.8334933    to the right, agree=0.992, adj=0.706, (0 split)
##       Net.profit.before.tax.Paid.in.capital    < 0.1385529    to the right, agree=0.985, adj=0.463, (0 split)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1769405    to the right, agree=0.984, adj=0.449, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1361       to the right, agree=0.983, adj=0.404, (0 split)
##       Net.Income.to.Total.Assets               < 0.7094973    to the right, agree=0.981, adj=0.331, (0 split)
## 
## Node number 2: 4637 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.0209187  P(node) =0.9715064
##     class counts:  4540    97
##    probabilities: 0.979 0.021 
##   left son=4 (4359 obs) right son=5 (278 obs)
##   Primary splits:
##       Borrowing.dependency                    < 0.3826258    to the left,  improve=11.158660, (0 missing)
##       Working.Capital.Equity                  < 0.7274442    to the right, improve= 9.783354, (0 missing)
##       Degree.of.Financial.Leverage..DFL.      < 0.02669723   to the right, improve= 9.749996, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.2012385    to the right, improve= 8.927490, (0 missing)
##       Interest.Expense.Ratio                  < 0.6303504    to the right, improve= 8.555154, (0 missing)
##   Surrogate splits:
##       Liability.to.Equity        < 0.2869906    to the left,  agree=0.960, adj=0.338, (0 split)
##       Debt.ratio..               < 0.1922484    to the left,  agree=0.960, adj=0.335, (0 split)
##       Net.worth.Assets           < 0.8077516    to the right, agree=0.960, adj=0.335, (0 split)
##       Equity.to.Liability        < 0.01809593   to the right, agree=0.960, adj=0.335, (0 split)
##       Total.debt.Total.net.worth < 0.01683811   to the left,  agree=0.959, adj=0.317, (0 split)
## 
## Node number 3: 136 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.4191176  P(node) =0.02849361
##     class counts:    79    57
##    probabilities: 0.581 0.419 
##   left son=6 (38 obs) right son=7 (98 obs)
##   Primary splits:
##       Quick.Ratio                           < 0.005814193  to the right, improve=8.720051, (0 missing)
##       Quick.Assets.Current.Liability        < 0.003547663  to the right, improve=8.070588, (0 missing)
##       Net.profit.before.tax.Paid.in.capital < 0.1066247    to the right, improve=7.902999, (0 missing)
##       Working.capitcal.Turnover.Rate        < 0.5939439    to the right, improve=7.695608, (0 missing)
##       Cash.Total.Assets                     < 0.00943074   to the right, improve=7.537255, (0 missing)
##   Surrogate splits:
##       Quick.Assets.Current.Liability      < 0.005713739  to the right, agree=0.941, adj=0.789, (0 split)
##       Current.Ratio                       < 0.009712606  to the right, agree=0.890, adj=0.605, (0 split)
##       Working.capitcal.Turnover.Rate      < 0.5939439    to the right, agree=0.890, adj=0.605, (0 split)
##       Current.Liability.to.Current.Assets < 0.0300781    to the left,  agree=0.890, adj=0.605, (0 split)
##       Working.Capital.to.Total.Assets     < 0.7725806    to the right, agree=0.860, adj=0.500, (0 split)
## 
## Node number 4: 4359 observations,    complexity param=0.01082251
##   predicted class=0  expected loss=0.01215875  P(node) =0.9132621
##     class counts:  4306    53
##    probabilities: 0.988 0.012 
##   left son=8 (4100 obs) right son=9 (259 obs)
##   Primary splits:
##       Persistent.EPS.in.the.Last.Four.Seasons            < 0.1997258    to the right, improve=3.569293, (0 missing)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02671931   to the right, improve=3.116037, (0 missing)
##       Total.income.Total.expense                         < 0.002060573  to the right, improve=3.069896, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5647908    to the right, improve=3.023697, (0 missing)
##       Interest.Expense.Ratio                             < 0.6303504    to the right, improve=2.964888, (0 missing)
##   Surrogate splits:
##       Net.profit.before.tax.Paid.in.capital             < 0.1565116    to the right, agree=0.980, adj=0.664, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan...          < 0.1575281    to the right, agree=0.978, adj=0.633, (0 split)
##       Net.Income.to.Stockholder.s.Equity                < 0.8378972    to the right, agree=0.973, adj=0.548, (0 split)
##       ROA.B..before.interest.and.depreciation.after.tax < 0.4829755    to the right, agree=0.967, adj=0.444, (0 split)
##       Net.Income.to.Total.Assets                        < 0.7680602    to the right, agree=0.965, adj=0.413, (0 split)
## 
## Node number 5: 278 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.1582734  P(node) =0.05824429
##     class counts:   234    44
##    probabilities: 0.842 0.158 
##   left son=10 (197 obs) right son=11 (81 obs)
##   Primary splits:
##       Non.industry.income.and.expenditure.revenue        < 0.303409     to the right, improve=9.121639, (0 missing)
##       Continuous.interest.rate..after.tax.               < 0.7815287    to the right, improve=8.355707, (0 missing)
##       Per.Share.Net.profit.before.tax..Yuan...           < 0.1720268    to the right, improve=8.280004, (0 missing)
##       After.tax.net.Interest.Rate                        < 0.8092315    to the right, improve=8.244631, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5606346    to the right, improve=8.196150, (0 missing)
##   Surrogate splits:
##       After.tax.net.Interest.Rate              < 0.8092007    to the right, agree=0.831, adj=0.420, (0 split)
##       Pre.tax.net.Interest.Rate                < 0.7973282    to the right, agree=0.827, adj=0.407, (0 split)
##       Continuous.interest.rate..after.tax.     < 0.7814471    to the right, agree=0.817, adj=0.370, (0 split)
##       Total.income.Total.expense               < 0.002123953  to the right, agree=0.802, adj=0.321, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1628756    to the right, agree=0.791, adj=0.284, (0 split)
## 
## Node number 6: 38 observations
##   predicted class=0  expected loss=0.1315789  P(node) =0.00796145
##     class counts:    33     5
##    probabilities: 0.868 0.132 
## 
## Node number 7: 98 observations,    complexity param=0.04329004
##   predicted class=1  expected loss=0.4693878  P(node) =0.02053216
##     class counts:    46    52
##    probabilities: 0.469 0.531 
##   left son=14 (22 obs) right son=15 (76 obs)
##   Primary splits:
##       Revenue.per.person                      < 0.00693859   to the left,  improve=6.902451, (0 missing)
##       Operating.profit.per.person             < 0.371379     to the right, improve=6.275150, (0 missing)
##       Realized.Sales.Gross.Profit.Growth.Rate < 0.02222639   to the right, improve=5.580371, (0 missing)
##       Net.Value.Growth.Rate                   < 0.0002560444 to the right, improve=5.173469, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.1805332    to the right, improve=5.153856, (0 missing)
##   Surrogate splits:
##       Operating.Profit.Rate                       < 0.9982432    to the left,  agree=0.837, adj=0.273, (0 split)
##       Revenue.Per.Share..Yuan...                  < 0.006692681  to the left,  agree=0.816, adj=0.182, (0 split)
##       Non.industry.income.and.expenditure.revenue < 0.303533     to the right, agree=0.806, adj=0.136, (0 split)
##       Continuous.interest.rate..after.tax.        < 0.7800147    to the left,  agree=0.806, adj=0.136, (0 split)
##       Accounts.Receivable.Turnover                < 0.0004027472 to the left,  agree=0.806, adj=0.136, (0 split)
## 
## Node number 8: 4100 observations
##   predicted class=0  expected loss=0.007073171  P(node) =0.8589985
##     class counts:  4071    29
##    probabilities: 0.993 0.007 
## 
## Node number 9: 259 observations,    complexity param=0.01082251
##   predicted class=0  expected loss=0.09266409  P(node) =0.05426357
##     class counts:   235    24
##    probabilities: 0.907 0.093 
##   left son=18 (222 obs) right son=19 (37 obs)
##   Primary splits:
##       Interest.Expense.Ratio                             < 0.6298537    to the right, improve=5.777349, (0 missing)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02660342   to the right, improve=4.434238, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5640698    to the right, improve=4.434238, (0 missing)
##       Cash.Total.Assets                                  < 0.01088273   to the right, improve=4.153023, (0 missing)
##       Interest.bearing.debt.interest.rate                < 0.0004480448 to the left,  improve=4.049067, (0 missing)
##   Surrogate splits:
##       Degree.of.Financial.Leverage..DFL.                 < 0.02660342   to the right, agree=0.981, adj=0.865, (0 split)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5640698    to the right, agree=0.981, adj=0.865, (0 split)
##       ROA.A..before.interest.and...after.tax             < 0.497138     to the left,  agree=0.892, adj=0.243, (0 split)
##       Cash.Current.Liability                             < 0.000247404  to the right, agree=0.876, adj=0.135, (0 split)
##       Net.Income.to.Total.Assets                         < 0.7717563    to the left,  agree=0.876, adj=0.135, (0 split)
## 
## Node number 10: 197 observations
##   predicted class=0  expected loss=0.07614213  P(node) =0.04127383
##     class counts:   182    15
##    probabilities: 0.924 0.076 
## 
## Node number 11: 81 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.3580247  P(node) =0.01697046
##     class counts:    52    29
##    probabilities: 0.642 0.358 
##   left son=22 (51 obs) right son=23 (30 obs)
##   Primary splits:
##       Interest.bearing.debt.interest.rate                < 0.0006460646 to the left,  improve=7.222803, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5620761    to the right, improve=6.365003, (0 missing)
##       Interest.Expense.Ratio                             < 0.6280977    to the right, improve=5.206994, (0 missing)
##       Operating.Expense.Rate                             < 4.585e+09    to the left,  improve=4.679868, (0 missing)
##       Revenue.Per.Share..Yuan...                         < 0.04138875   to the left,  improve=4.457103, (0 missing)
##   Surrogate splits:
##       Degree.of.Financial.Leverage..DFL.                 < 0.02643784   to the right, agree=0.840, adj=0.567, (0 split)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5626205    to the right, agree=0.778, adj=0.400, (0 split)
##       Interest.Expense.Ratio                             < 0.6280977    to the right, agree=0.741, adj=0.300, (0 split)
##       Cash.Current.Liability                             < 0.0002664469 to the right, agree=0.741, adj=0.300, (0 split)
##       Cash.Total.Assets                                  < 0.01247115   to the right, agree=0.728, adj=0.267, (0 split)
## 
## Node number 14: 22 observations
##   predicted class=0  expected loss=0.1818182  P(node) =0.00460926
##     class counts:    18     4
##    probabilities: 0.818 0.182 
## 
## Node number 15: 76 observations,    complexity param=0.04329004
##   predicted class=1  expected loss=0.3684211  P(node) =0.0159229
##     class counts:    28    48
##    probabilities: 0.368 0.632 
##   left son=30 (8 obs) right son=31 (68 obs)
##   Primary splits:
##       Continuous.interest.rate..after.tax.    < 0.7814384    to the right, improve=7.133127, (0 missing)
##       Fixed.Assets.Turnover.Frequency         < 0.001012572  to the right, improve=5.035320, (0 missing)
##       Total.Asset.Return.Growth.Rate.Ratio    < 0.2615727    to the right, improve=4.658744, (0 missing)
##       Net.Value.Growth.Rate                   < 0.0002560444 to the right, improve=4.257310, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.1793042    to the right, improve=3.995872, (0 missing)
##   Surrogate splits:
##       Pre.tax.net.Interest.Rate       < 0.7972556    to the right, agree=0.974, adj=0.750, (0 split)
##       After.tax.net.Interest.Rate     < 0.8091983    to the right, agree=0.974, adj=0.750, (0 split)
##       Revenue.Per.Share..Yuan...      < 0.08244476   to the right, agree=0.947, adj=0.500, (0 split)
##       Total.Asset.Turnover            < 0.3718141    to the right, agree=0.947, adj=0.500, (0 split)
##       Net.Worth.Turnover.Rate..times. < 0.1894355    to the right, agree=0.934, adj=0.375, (0 split)
## 
## Node number 18: 222 observations
##   predicted class=0  expected loss=0.04954955  P(node) =0.04651163
##     class counts:   211    11
##    probabilities: 0.950 0.050 
## 
## Node number 19: 37 observations,    complexity param=0.01082251
##   predicted class=0  expected loss=0.3513514  P(node) =0.007751938
##     class counts:    24    13
##    probabilities: 0.649 0.351 
##   left son=38 (28 obs) right son=39 (9 obs)
##   Primary splits:
##       Interest.bearing.debt.interest.rate  < 0.0007065707 to the left,  improve=4.325182, (0 missing)
##       Fixed.Assets.Turnover.Frequency      < 4.235e+09    to the left,  improve=3.531532, (0 missing)
##       Continuous.interest.rate..after.tax. < 0.781343     to the right, improve=3.435453, (0 missing)
##       Inventory.Current.Liability          < 0.004639869  to the right, improve=3.435453, (0 missing)
##       Cash.Total.Assets                    < 0.01279077   to the right, improve=3.331532, (0 missing)
##   Surrogate splits:
##       Total.debt.Total.net.worth < 0.01745365   to the left,  agree=0.838, adj=0.333, (0 split)
##       Debt.ratio..               < 0.1946216    to the left,  agree=0.838, adj=0.333, (0 split)
##       Net.worth.Assets           < 0.8053784    to the right, agree=0.838, adj=0.333, (0 split)
##       Liability.to.Equity        < 0.2874368    to the left,  agree=0.838, adj=0.333, (0 split)
##       Equity.to.Liability        < 0.01783194   to the right, agree=0.838, adj=0.333, (0 split)
## 
## Node number 22: 51 observations
##   predicted class=0  expected loss=0.1960784  P(node) =0.0106851
##     class counts:    41    10
##    probabilities: 0.804 0.196 
## 
## Node number 23: 30 observations,    complexity param=0.01298701
##   predicted class=1  expected loss=0.3666667  P(node) =0.006285355
##     class counts:    11    19
##    probabilities: 0.367 0.633 
##   left son=46 (20 obs) right son=47 (10 obs)
##   Primary splits:
##       Long.term.Liability.to.Current.Assets                   < 0.004452273  to the right, improve=4.033333, (0 missing)
##       ROA.C..before.interest.and.depreciation.before.interest < 0.482377     to the right, improve=3.600000, (0 missing)
##       Revenue.per.person                                      < 0.04380682   to the left,  improve=3.457143, (0 missing)
##       Allocation.rate.per.person                              < 0.05656835   to the left,  improve=2.933333, (0 missing)
##       ROA.B..before.interest.and.depreciation.after.tax       < 0.531854     to the right, improve=2.838311, (0 missing)
##   Surrogate splits:
##       Equity.to.Long.term.Liability  < 0.1208088    to the right, agree=0.933, adj=0.8, (0 split)
##       Revenue.per.person             < 0.04817649   to the left,  agree=0.900, adj=0.7, (0 split)
##       Current.Liability.to.Assets    < 0.1569419    to the left,  agree=0.900, adj=0.7, (0 split)
##       Current.Liabilities.Liability  < 0.7965408    to the left,  agree=0.900, adj=0.7, (0 split)
##       Current.Liability.to.Liability < 0.7965408    to the left,  agree=0.900, adj=0.7, (0 split)
## 
## Node number 30: 8 observations
##   predicted class=0  expected loss=0  P(node) =0.001676095
##     class counts:     8     0
##    probabilities: 1.000 0.000 
## 
## Node number 31: 68 observations
##   predicted class=1  expected loss=0.2941176  P(node) =0.0142468
##     class counts:    20    48
##    probabilities: 0.294 0.706 
## 
## Node number 38: 28 observations
##   predicted class=0  expected loss=0.2142857  P(node) =0.005866331
##     class counts:    22     6
##    probabilities: 0.786 0.214 
## 
## Node number 39: 9 observations
##   predicted class=1  expected loss=0.2222222  P(node) =0.001885607
##     class counts:     2     7
##    probabilities: 0.222 0.778 
## 
## Node number 46: 20 observations
##   predicted class=0  expected loss=0.45  P(node) =0.004190237
##     class counts:    11     9
##    probabilities: 0.550 0.450 
## 
## Node number 47: 10 observations
##   predicted class=1  expected loss=0  P(node) =0.002095118
##     class counts:     0    10
##    probabilities: 0.000 1.000
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$Bankrupt.))
y1[y[,1]>0.5] <- 1
y1[-(y[,1]>0.5)] <- 0
table(as.factor(y1),dataTE$Bankrupt.)
##    
##        0    1
##   0 1980   66
length(y1)
## [1] 2046
tree1=rpart(dataTR$Bankrupt.~.,method="class",data=dataTR,maxdepth=6,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Bankrupt. ~ ., data = dataTR, method = "class", 
##     maxdepth = 6, xval = 10)
##   n= 4773 
## 
##           CP nsplit rel error   xerror       xstd
## 1 0.04329004      0 1.0000000 1.000000 0.07927165
## 2 0.01948052      4 0.8181818 1.012987 0.07976747
## 3 0.01731602      6 0.7792208 1.051948 0.08123415
## 4 0.01623377     11 0.6883117 1.077922 0.08219525
## 5 0.01082251     13 0.6558442 1.103896 0.08314354
## 6 0.01000000     16 0.6233766 1.103896 0.08314354
## 
## Variable importance
##                              Net.Value.Growth.Rate 
##                                                  9 
##                 Net.Income.to.Stockholder.s.Equity 
##                                                  7 
##           Per.Share.Net.profit.before.tax..Yuan... 
##                                                  6 
##              Net.profit.before.tax.Paid.in.capital 
##                                                  6 
##            Persistent.EPS.in.the.Last.Four.Seasons 
##                                                  5 
##                         Net.Income.to.Total.Assets 
##                                                  5 
##                Interest.bearing.debt.interest.rate 
##                                                  3 
##               Continuous.interest.rate..after.tax. 
##                                                  3 
##                        After.tax.net.Interest.Rate 
##                                                  3 
##                          Pre.tax.net.Interest.Rate 
##                                                  3 
##                               Borrowing.dependency 
##                                                  3 
##                 Degree.of.Financial.Leverage..DFL. 
##                                                  2 
##        Non.industry.income.and.expenditure.revenue 
##                                                  2 
##                                 Revenue.per.person 
##                                                  2 
##                             Interest.Expense.Ratio 
##                                                  2 
## Interest.Coverage.Ratio..Interest.expense.to.EBIT. 
##                                                  2 
##                                        Quick.Ratio 
##                                                  2 
##                              Operating.Profit.Rate 
##                                                  2 
##                         Revenue.Per.Share..Yuan... 
##                                                  2 
##                     Quick.Assets.Current.Liability 
##                                                  2 
##               Total.Asset.Return.Growth.Rate.Ratio 
##                                                  1 
##                               Total.Asset.Turnover 
##                                                  1 
##             ROA.A..before.interest.and...after.tax 
##                                                  1 
##                    Working.Capital.to.Total.Assets 
##                                                  1 
##                Current.Liability.to.Current.Assets 
##                                                  1 
##                                      Current.Ratio 
##                                                  1 
##                     Working.capitcal.Turnover.Rate 
##                                                  1 
##                                Liability.to.Equity 
##                                                  1 
##                                       Debt.ratio.. 
##                                                  1 
##                                Equity.to.Liability 
##                                                  1 
##                                   Net.worth.Assets 
##                                                  1 
##                         Total.debt.Total.net.worth 
##                                                  1 
##                        Current.Liability.to.Assets 
##                                                  1 
##                               Total.expense.Assets 
##                                                  1 
##                      Current.Liabilities.Liability 
##                                                  1 
##              Long.term.Liability.to.Current.Assets 
##                                                  1 
##                         Total.income.Total.expense 
##                                                  1 
##                        Operating.profit.per.person 
##                                                  1 
##                   Operating.profit.Paid.in.capital 
##                                                  1 
##                Operating.Profit.Per.Share..Yuan... 
##                                                  1 
##                             Operating.Gross.Margin 
##                                                  1 
##                      Equity.to.Long.term.Liability 
##                                                  1 
##                             Cash.Current.Liability 
##                                                  1 
##                     Current.Liability.to.Liability 
##                                                  1 
##                    Net.Worth.Turnover.Rate..times. 
##                                                  1 
##                         Current.Liabilities.Equity 
##                                                  1 
##                        Current.Liability.to.Equity 
##                                                  1 
##                        Realized.Sales.Gross.Margin 
##                                                  1 
## 
## Node number 1: 4773 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.03226482  P(node) =1
##     class counts:  4619   154
##    probabilities: 0.968 0.032 
##   left son=2 (4637 obs) right son=3 (136 obs)
##   Primary splits:
##       Net.Value.Growth.Rate                    < 0.0003670415 to the right, improve=41.90007, (0 missing)
##       Net.Income.to.Stockholder.s.Equity       < 0.8344808    to the right, improve=38.84998, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1807696    to the right, improve=38.56514, (0 missing)
##       Net.profit.before.tax.Paid.in.capital    < 0.1406304    to the right, improve=36.02657, (0 missing)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1486782    to the right, improve=33.70200, (0 missing)
##   Surrogate splits:
##       Net.Income.to.Stockholder.s.Equity       < 0.8334933    to the right, agree=0.992, adj=0.706, (0 split)
##       Net.profit.before.tax.Paid.in.capital    < 0.1385529    to the right, agree=0.985, adj=0.463, (0 split)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1769405    to the right, agree=0.984, adj=0.449, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1361       to the right, agree=0.983, adj=0.404, (0 split)
##       Net.Income.to.Total.Assets               < 0.7094973    to the right, agree=0.981, adj=0.331, (0 split)
## 
## Node number 2: 4637 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.0209187  P(node) =0.9715064
##     class counts:  4540    97
##    probabilities: 0.979 0.021 
##   left son=4 (4359 obs) right son=5 (278 obs)
##   Primary splits:
##       Borrowing.dependency                    < 0.3826258    to the left,  improve=11.158660, (0 missing)
##       Working.Capital.Equity                  < 0.7274442    to the right, improve= 9.783354, (0 missing)
##       Degree.of.Financial.Leverage..DFL.      < 0.02669723   to the right, improve= 9.749996, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.2012385    to the right, improve= 8.927490, (0 missing)
##       Interest.Expense.Ratio                  < 0.6303504    to the right, improve= 8.555154, (0 missing)
##   Surrogate splits:
##       Liability.to.Equity        < 0.2869906    to the left,  agree=0.960, adj=0.338, (0 split)
##       Debt.ratio..               < 0.1922484    to the left,  agree=0.960, adj=0.335, (0 split)
##       Net.worth.Assets           < 0.8077516    to the right, agree=0.960, adj=0.335, (0 split)
##       Equity.to.Liability        < 0.01809593   to the right, agree=0.960, adj=0.335, (0 split)
##       Total.debt.Total.net.worth < 0.01683811   to the left,  agree=0.959, adj=0.317, (0 split)
## 
## Node number 3: 136 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.4191176  P(node) =0.02849361
##     class counts:    79    57
##    probabilities: 0.581 0.419 
##   left son=6 (38 obs) right son=7 (98 obs)
##   Primary splits:
##       Quick.Ratio                           < 0.005814193  to the right, improve=8.720051, (0 missing)
##       Quick.Assets.Current.Liability        < 0.003547663  to the right, improve=8.070588, (0 missing)
##       Net.profit.before.tax.Paid.in.capital < 0.1066247    to the right, improve=7.902999, (0 missing)
##       Working.capitcal.Turnover.Rate        < 0.5939439    to the right, improve=7.695608, (0 missing)
##       Cash.Total.Assets                     < 0.00943074   to the right, improve=7.537255, (0 missing)
##   Surrogate splits:
##       Quick.Assets.Current.Liability      < 0.005713739  to the right, agree=0.941, adj=0.789, (0 split)
##       Current.Ratio                       < 0.009712606  to the right, agree=0.890, adj=0.605, (0 split)
##       Working.capitcal.Turnover.Rate      < 0.5939439    to the right, agree=0.890, adj=0.605, (0 split)
##       Current.Liability.to.Current.Assets < 0.0300781    to the left,  agree=0.890, adj=0.605, (0 split)
##       Working.Capital.to.Total.Assets     < 0.7725806    to the right, agree=0.860, adj=0.500, (0 split)
## 
## Node number 4: 4359 observations,    complexity param=0.01082251
##   predicted class=0  expected loss=0.01215875  P(node) =0.9132621
##     class counts:  4306    53
##    probabilities: 0.988 0.012 
##   left son=8 (4100 obs) right son=9 (259 obs)
##   Primary splits:
##       Persistent.EPS.in.the.Last.Four.Seasons            < 0.1997258    to the right, improve=3.569293, (0 missing)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02671931   to the right, improve=3.116037, (0 missing)
##       Total.income.Total.expense                         < 0.002060573  to the right, improve=3.069896, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5647908    to the right, improve=3.023697, (0 missing)
##       Interest.Expense.Ratio                             < 0.6303504    to the right, improve=2.964888, (0 missing)
##   Surrogate splits:
##       Net.profit.before.tax.Paid.in.capital             < 0.1565116    to the right, agree=0.980, adj=0.664, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan...          < 0.1575281    to the right, agree=0.978, adj=0.633, (0 split)
##       Net.Income.to.Stockholder.s.Equity                < 0.8378972    to the right, agree=0.973, adj=0.548, (0 split)
##       ROA.B..before.interest.and.depreciation.after.tax < 0.4829755    to the right, agree=0.967, adj=0.444, (0 split)
##       Net.Income.to.Total.Assets                        < 0.7680602    to the right, agree=0.965, adj=0.413, (0 split)
## 
## Node number 5: 278 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.1582734  P(node) =0.05824429
##     class counts:   234    44
##    probabilities: 0.842 0.158 
##   left son=10 (197 obs) right son=11 (81 obs)
##   Primary splits:
##       Non.industry.income.and.expenditure.revenue        < 0.303409     to the right, improve=9.121639, (0 missing)
##       Continuous.interest.rate..after.tax.               < 0.7815287    to the right, improve=8.355707, (0 missing)
##       Per.Share.Net.profit.before.tax..Yuan...           < 0.1720268    to the right, improve=8.280004, (0 missing)
##       After.tax.net.Interest.Rate                        < 0.8092315    to the right, improve=8.244631, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5606346    to the right, improve=8.196150, (0 missing)
##   Surrogate splits:
##       After.tax.net.Interest.Rate              < 0.8092007    to the right, agree=0.831, adj=0.420, (0 split)
##       Pre.tax.net.Interest.Rate                < 0.7973282    to the right, agree=0.827, adj=0.407, (0 split)
##       Continuous.interest.rate..after.tax.     < 0.7814471    to the right, agree=0.817, adj=0.370, (0 split)
##       Total.income.Total.expense               < 0.002123953  to the right, agree=0.802, adj=0.321, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1628756    to the right, agree=0.791, adj=0.284, (0 split)
## 
## Node number 6: 38 observations
##   predicted class=0  expected loss=0.1315789  P(node) =0.00796145
##     class counts:    33     5
##    probabilities: 0.868 0.132 
## 
## Node number 7: 98 observations,    complexity param=0.04329004
##   predicted class=1  expected loss=0.4693878  P(node) =0.02053216
##     class counts:    46    52
##    probabilities: 0.469 0.531 
##   left son=14 (22 obs) right son=15 (76 obs)
##   Primary splits:
##       Revenue.per.person                      < 0.00693859   to the left,  improve=6.902451, (0 missing)
##       Operating.profit.per.person             < 0.371379     to the right, improve=6.275150, (0 missing)
##       Realized.Sales.Gross.Profit.Growth.Rate < 0.02222639   to the right, improve=5.580371, (0 missing)
##       Net.Value.Growth.Rate                   < 0.0002560444 to the right, improve=5.173469, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.1805332    to the right, improve=5.153856, (0 missing)
##   Surrogate splits:
##       Operating.Profit.Rate                       < 0.9982432    to the left,  agree=0.837, adj=0.273, (0 split)
##       Revenue.Per.Share..Yuan...                  < 0.006692681  to the left,  agree=0.816, adj=0.182, (0 split)
##       Non.industry.income.and.expenditure.revenue < 0.303533     to the right, agree=0.806, adj=0.136, (0 split)
##       Continuous.interest.rate..after.tax.        < 0.7800147    to the left,  agree=0.806, adj=0.136, (0 split)
##       Accounts.Receivable.Turnover                < 0.0004027472 to the left,  agree=0.806, adj=0.136, (0 split)
## 
## Node number 8: 4100 observations
##   predicted class=0  expected loss=0.007073171  P(node) =0.8589985
##     class counts:  4071    29
##    probabilities: 0.993 0.007 
## 
## Node number 9: 259 observations,    complexity param=0.01082251
##   predicted class=0  expected loss=0.09266409  P(node) =0.05426357
##     class counts:   235    24
##    probabilities: 0.907 0.093 
##   left son=18 (222 obs) right son=19 (37 obs)
##   Primary splits:
##       Interest.Expense.Ratio                             < 0.6298537    to the right, improve=5.777349, (0 missing)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02660342   to the right, improve=4.434238, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5640698    to the right, improve=4.434238, (0 missing)
##       Cash.Total.Assets                                  < 0.01088273   to the right, improve=4.153023, (0 missing)
##       Interest.bearing.debt.interest.rate                < 0.0004480448 to the left,  improve=4.049067, (0 missing)
##   Surrogate splits:
##       Degree.of.Financial.Leverage..DFL.                 < 0.02660342   to the right, agree=0.981, adj=0.865, (0 split)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5640698    to the right, agree=0.981, adj=0.865, (0 split)
##       ROA.A..before.interest.and...after.tax             < 0.497138     to the left,  agree=0.892, adj=0.243, (0 split)
##       Cash.Current.Liability                             < 0.000247404  to the right, agree=0.876, adj=0.135, (0 split)
##       Net.Income.to.Total.Assets                         < 0.7717563    to the left,  agree=0.876, adj=0.135, (0 split)
## 
## Node number 10: 197 observations
##   predicted class=0  expected loss=0.07614213  P(node) =0.04127383
##     class counts:   182    15
##    probabilities: 0.924 0.076 
## 
## Node number 11: 81 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.3580247  P(node) =0.01697046
##     class counts:    52    29
##    probabilities: 0.642 0.358 
##   left son=22 (51 obs) right son=23 (30 obs)
##   Primary splits:
##       Interest.bearing.debt.interest.rate                < 0.0006460646 to the left,  improve=7.222803, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5620761    to the right, improve=6.365003, (0 missing)
##       Interest.Expense.Ratio                             < 0.6280977    to the right, improve=5.206994, (0 missing)
##       Operating.Expense.Rate                             < 4.585e+09    to the left,  improve=4.679868, (0 missing)
##       Revenue.Per.Share..Yuan...                         < 0.04138875   to the left,  improve=4.457103, (0 missing)
##   Surrogate splits:
##       Degree.of.Financial.Leverage..DFL.                 < 0.02643784   to the right, agree=0.840, adj=0.567, (0 split)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5626205    to the right, agree=0.778, adj=0.400, (0 split)
##       Interest.Expense.Ratio                             < 0.6280977    to the right, agree=0.741, adj=0.300, (0 split)
##       Cash.Current.Liability                             < 0.0002664469 to the right, agree=0.741, adj=0.300, (0 split)
##       Cash.Total.Assets                                  < 0.01247115   to the right, agree=0.728, adj=0.267, (0 split)
## 
## Node number 14: 22 observations
##   predicted class=0  expected loss=0.1818182  P(node) =0.00460926
##     class counts:    18     4
##    probabilities: 0.818 0.182 
## 
## Node number 15: 76 observations,    complexity param=0.04329004
##   predicted class=1  expected loss=0.3684211  P(node) =0.0159229
##     class counts:    28    48
##    probabilities: 0.368 0.632 
##   left son=30 (8 obs) right son=31 (68 obs)
##   Primary splits:
##       Continuous.interest.rate..after.tax.    < 0.7814384    to the right, improve=7.133127, (0 missing)
##       Fixed.Assets.Turnover.Frequency         < 0.001012572  to the right, improve=5.035320, (0 missing)
##       Total.Asset.Return.Growth.Rate.Ratio    < 0.2615727    to the right, improve=4.658744, (0 missing)
##       Net.Value.Growth.Rate                   < 0.0002560444 to the right, improve=4.257310, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.1793042    to the right, improve=3.995872, (0 missing)
##   Surrogate splits:
##       Pre.tax.net.Interest.Rate       < 0.7972556    to the right, agree=0.974, adj=0.750, (0 split)
##       After.tax.net.Interest.Rate     < 0.8091983    to the right, agree=0.974, adj=0.750, (0 split)
##       Revenue.Per.Share..Yuan...      < 0.08244476   to the right, agree=0.947, adj=0.500, (0 split)
##       Total.Asset.Turnover            < 0.3718141    to the right, agree=0.947, adj=0.500, (0 split)
##       Net.Worth.Turnover.Rate..times. < 0.1894355    to the right, agree=0.934, adj=0.375, (0 split)
## 
## Node number 18: 222 observations
##   predicted class=0  expected loss=0.04954955  P(node) =0.04651163
##     class counts:   211    11
##    probabilities: 0.950 0.050 
## 
## Node number 19: 37 observations,    complexity param=0.01082251
##   predicted class=0  expected loss=0.3513514  P(node) =0.007751938
##     class counts:    24    13
##    probabilities: 0.649 0.351 
##   left son=38 (28 obs) right son=39 (9 obs)
##   Primary splits:
##       Interest.bearing.debt.interest.rate  < 0.0007065707 to the left,  improve=4.325182, (0 missing)
##       Fixed.Assets.Turnover.Frequency      < 4.235e+09    to the left,  improve=3.531532, (0 missing)
##       Continuous.interest.rate..after.tax. < 0.781343     to the right, improve=3.435453, (0 missing)
##       Inventory.Current.Liability          < 0.004639869  to the right, improve=3.435453, (0 missing)
##       Cash.Total.Assets                    < 0.01279077   to the right, improve=3.331532, (0 missing)
##   Surrogate splits:
##       Total.debt.Total.net.worth < 0.01745365   to the left,  agree=0.838, adj=0.333, (0 split)
##       Debt.ratio..               < 0.1946216    to the left,  agree=0.838, adj=0.333, (0 split)
##       Net.worth.Assets           < 0.8053784    to the right, agree=0.838, adj=0.333, (0 split)
##       Liability.to.Equity        < 0.2874368    to the left,  agree=0.838, adj=0.333, (0 split)
##       Equity.to.Liability        < 0.01783194   to the right, agree=0.838, adj=0.333, (0 split)
## 
## Node number 22: 51 observations,    complexity param=0.01623377
##   predicted class=0  expected loss=0.1960784  P(node) =0.0106851
##     class counts:    41    10
##    probabilities: 0.804 0.196 
##   left son=44 (24 obs) right son=45 (27 obs)
##   Primary splits:
##       Operating.profit.per.person         < 0.3922684    to the right, improve=3.485839, (0 missing)
##       Net.Income.to.Total.Assets          < 0.7711328    to the right, improve=3.431967, (0 missing)
##       Operating.Profit.Rate               < 0.9989727    to the right, improve=3.221289, (0 missing)
##       Operating.Profit.Per.Share..Yuan... < 0.09620552   to the right, improve=2.974983, (0 missing)
##       Operating.profit.Paid.in.capital    < 0.09617796   to the right, improve=2.974983, (0 missing)
##   Surrogate splits:
##       Operating.Profit.Rate               < 0.998964     to the right, agree=0.980, adj=0.958, (0 split)
##       Operating.Profit.Per.Share..Yuan... < 0.09543197   to the right, agree=0.980, adj=0.958, (0 split)
##       Operating.profit.Paid.in.capital    < 0.09544514   to the right, agree=0.980, adj=0.958, (0 split)
##       Operating.Gross.Margin              < 0.6010392    to the right, agree=0.843, adj=0.667, (0 split)
##       Realized.Sales.Gross.Margin         < 0.6010825    to the right, agree=0.843, adj=0.667, (0 split)
## 
## Node number 23: 30 observations,    complexity param=0.01731602
##   predicted class=1  expected loss=0.3666667  P(node) =0.006285355
##     class counts:    11    19
##    probabilities: 0.367 0.633 
##   left son=46 (20 obs) right son=47 (10 obs)
##   Primary splits:
##       Long.term.Liability.to.Current.Assets                   < 0.004452273  to the right, improve=4.033333, (0 missing)
##       ROA.C..before.interest.and.depreciation.before.interest < 0.482377     to the right, improve=3.600000, (0 missing)
##       Revenue.per.person                                      < 0.04380682   to the left,  improve=3.457143, (0 missing)
##       Allocation.rate.per.person                              < 0.05656835   to the left,  improve=2.933333, (0 missing)
##       ROA.B..before.interest.and.depreciation.after.tax       < 0.531854     to the right, improve=2.838311, (0 missing)
##   Surrogate splits:
##       Equity.to.Long.term.Liability  < 0.1208088    to the right, agree=0.933, adj=0.8, (0 split)
##       Revenue.per.person             < 0.04817649   to the left,  agree=0.900, adj=0.7, (0 split)
##       Current.Liability.to.Assets    < 0.1569419    to the left,  agree=0.900, adj=0.7, (0 split)
##       Current.Liabilities.Liability  < 0.7965408    to the left,  agree=0.900, adj=0.7, (0 split)
##       Current.Liability.to.Liability < 0.7965408    to the left,  agree=0.900, adj=0.7, (0 split)
## 
## Node number 30: 8 observations
##   predicted class=0  expected loss=0  P(node) =0.001676095
##     class counts:     8     0
##    probabilities: 1.000 0.000 
## 
## Node number 31: 68 observations,    complexity param=0.01948052
##   predicted class=1  expected loss=0.2941176  P(node) =0.0142468
##     class counts:    20    48
##    probabilities: 0.294 0.706 
##   left son=62 (42 obs) right son=63 (26 obs)
##   Primary splits:
##       Interest.bearing.debt.interest.rate                < 0.0004680468 to the left,  improve=3.971558, (0 missing)
##       Total.Asset.Return.Growth.Rate.Ratio               < 0.2615727    to the right, improve=3.050109, (0 missing)
##       Interest.Expense.Ratio                             < 0.6302102    to the right, improve=2.999522, (0 missing)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02668064   to the right, improve=2.999522, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5645698    to the right, improve=2.999522, (0 missing)
##   Surrogate splits:
##       Interest.Expense.Ratio                             < 0.6301963    to the right, agree=0.765, adj=0.385, (0 split)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02667625   to the right, agree=0.765, adj=0.385, (0 split)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5645436    to the right, agree=0.765, adj=0.385, (0 split)
##       Total.income.Total.expense                         < 0.00203894   to the left,  agree=0.721, adj=0.269, (0 split)
##       Operating.Gross.Margin                             < 0.6021058    to the left,  agree=0.706, adj=0.231, (0 split)
## 
## Node number 38: 28 observations
##   predicted class=0  expected loss=0.2142857  P(node) =0.005866331
##     class counts:    22     6
##    probabilities: 0.786 0.214 
## 
## Node number 39: 9 observations
##   predicted class=1  expected loss=0.2222222  P(node) =0.001885607
##     class counts:     2     7
##    probabilities: 0.222 0.778 
## 
## Node number 44: 24 observations
##   predicted class=0  expected loss=0  P(node) =0.005028284
##     class counts:    24     0
##    probabilities: 1.000 0.000 
## 
## Node number 45: 27 observations,    complexity param=0.01623377
##   predicted class=0  expected loss=0.3703704  P(node) =0.00565682
##     class counts:    17    10
##    probabilities: 0.630 0.370 
##   left son=90 (16 obs) right son=91 (11 obs)
##   Primary splits:
##       After.tax.net.Interest.Rate           < 0.8090741    to the left,  improve=4.728956, (0 missing)
##       Continuous.interest.rate..after.tax.  < 0.7813489    to the left,  improve=4.728956, (0 missing)
##       Revenue.Per.Share..Yuan...            < 0.02714884   to the left,  improve=4.478307, (0 missing)
##       Research.and.development.expense.rate < 3.7e+08      to the left,  improve=3.792593, (0 missing)
##       Pre.tax.net.Interest.Rate             < 0.7971327    to the left,  improve=3.451416, (0 missing)
##   Surrogate splits:
##       Pre.tax.net.Interest.Rate            < 0.7971327    to the left,  agree=0.963, adj=0.909, (0 split)
##       Continuous.interest.rate..after.tax. < 0.7813183    to the left,  agree=0.926, adj=0.818, (0 split)
##       Operating.Profit.Rate                < 0.99886      to the left,  agree=0.852, adj=0.636, (0 split)
##       Revenue.Per.Share..Yuan...           < 0.0214317    to the left,  agree=0.852, adj=0.636, (0 split)
##       Total.Asset.Turnover                 < 0.05172414   to the left,  agree=0.815, adj=0.545, (0 split)
## 
## Node number 46: 20 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.45  P(node) =0.004190237
##     class counts:    11     9
##    probabilities: 0.550 0.450 
##   left son=92 (10 obs) right son=93 (10 obs)
##   Primary splits:
##       Current.Liabilities.Equity                  < 0.3359431    to the right, improve=2.500000, (0 missing)
##       Current.Liability.to.Equity                 < 0.3359431    to the right, improve=2.500000, (0 missing)
##       Operating.Gross.Margin                      < 0.6057849    to the right, improve=2.031868, (0 missing)
##       Realized.Sales.Gross.Margin                 < 0.6057849    to the right, improve=2.031868, (0 missing)
##       Inventory.and.accounts.receivable.Net.value < 0.4024762    to the right, improve=2.031868, (0 missing)
##   Surrogate splits:
##       Current.Liability.to.Equity          < 0.3359431    to the right, agree=1.0, adj=1.0, (0 split)
##       Current.Liability.to.Assets          < 0.1203266    to the right, agree=0.9, adj=0.8, (0 split)
##       Long.term.fund.suitability.ratio..A. < 0.004984772  to the left,  agree=0.8, adj=0.6, (0 split)
##       Working.Capital.to.Total.Assets      < 0.7096198    to the left,  agree=0.8, adj=0.6, (0 split)
##       Current.Liabilities.Liability        < 0.452111     to the right, agree=0.8, adj=0.6, (0 split)
## 
## Node number 47: 10 observations
##   predicted class=1  expected loss=0  P(node) =0.002095118
##     class counts:     0    10
##    probabilities: 0.000 1.000 
## 
## Node number 62: 42 observations,    complexity param=0.01948052
##   predicted class=1  expected loss=0.4285714  P(node) =0.008799497
##     class counts:    18    24
##    probabilities: 0.429 0.571 
##   left son=124 (30 obs) right son=125 (12 obs)
##   Primary splits:
##       Total.Asset.Return.Growth.Rate.Ratio < 0.2615727    to the right, improve=6.171429, (0 missing)
##       Net.Value.Growth.Rate                < 0.0002560444 to the right, improve=5.474654, (0 missing)
##       Continuous.interest.rate..after.tax. < 0.7809941    to the right, improve=5.084014, (0 missing)
##       Pre.tax.net.Interest.Rate            < 0.7966958    to the right, improve=4.763736, (0 missing)
##       After.tax.net.Interest.Rate          < 0.808647     to the right, improve=4.763736, (0 missing)
##   Surrogate splits:
##       ROA.A..before.interest.and...after.tax   < 0.3072122    to the right, agree=0.929, adj=0.750, (0 split)
##       Total.expense.Assets                     < 0.1203253    to the left,  agree=0.929, adj=0.750, (0 split)
##       Net.Income.to.Total.Assets               < 0.6176428    to the right, agree=0.929, adj=0.750, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1038638    to the right, agree=0.905, adj=0.667, (0 split)
##       Net.profit.before.tax.Paid.in.capital    < 0.1057386    to the right, agree=0.905, adj=0.667, (0 split)
## 
## Node number 63: 26 observations
##   predicted class=1  expected loss=0.07692308  P(node) =0.005447308
##     class counts:     2    24
##    probabilities: 0.077 0.923 
## 
## Node number 90: 16 observations
##   predicted class=0  expected loss=0.125  P(node) =0.003352189
##     class counts:    14     2
##    probabilities: 0.875 0.125 
## 
## Node number 91: 11 observations
##   predicted class=1  expected loss=0.2727273  P(node) =0.00230463
##     class counts:     3     8
##    probabilities: 0.273 0.727 
## 
## Node number 92: 10 observations
##   predicted class=0  expected loss=0.2  P(node) =0.002095118
##     class counts:     8     2
##    probabilities: 0.800 0.200 
## 
## Node number 93: 10 observations
##   predicted class=1  expected loss=0.3  P(node) =0.002095118
##     class counts:     3     7
##    probabilities: 0.300 0.700 
## 
## Node number 124: 30 observations
##   predicted class=0  expected loss=0.4  P(node) =0.006285355
##     class counts:    18    12
##    probabilities: 0.600 0.400 
## 
## Node number 125: 12 observations
##   predicted class=1  expected loss=0  P(node) =0.002514142
##     class counts:     0    12
##    probabilities: 0.000 1.000
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$Bankrupt.))
y1[y[,1]>0.5] <- 1
y1[-(y[,1]>0.5)] <- 0
table(as.factor(y1),dataTE$Bankrupt.)
##    
##        0    1
##   0 1980   66
length(y1)
## [1] 2046
tree1=rpart(dataTR$Bankrupt.~.,method="class",data=dataTR,maxdepth=7,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Bankrupt. ~ ., data = dataTR, method = "class", 
##     maxdepth = 7, xval = 10)
##   n= 4773 
## 
##           CP nsplit rel error   xerror       xstd
## 1 0.04329004      0 1.0000000 1.000000 0.07927165
## 2 0.02597403      4 0.8181818 1.032468 0.08050464
## 3 0.01731602      7 0.7402597 1.058442 0.08147565
## 4 0.01623377     12 0.6493506 1.045455 0.08099182
## 5 0.01082251     14 0.6168831 1.058442 0.08147565
## 6 0.01000000     17 0.5844156 1.077922 0.08219525
## 
## Variable importance
##                                   Net.Value.Growth.Rate 
##                                                       9 
##                      Net.Income.to.Stockholder.s.Equity 
##                                                       7 
##                Per.Share.Net.profit.before.tax..Yuan... 
##                                                       6 
##                   Net.profit.before.tax.Paid.in.capital 
##                                                       6 
##                              Net.Income.to.Total.Assets 
##                                                       5 
##                 Persistent.EPS.in.the.Last.Four.Seasons 
##                                                       5 
##                     Interest.bearing.debt.interest.rate 
##                                                       3 
##                    Continuous.interest.rate..after.tax. 
##                                                       3 
##                             After.tax.net.Interest.Rate 
##                                                       3 
##                               Pre.tax.net.Interest.Rate 
##                                                       3 
##                                  Interest.Expense.Ratio 
##                                                       3 
##                                    Borrowing.dependency 
##                                                       2 
##                      Degree.of.Financial.Leverage..DFL. 
##                                                       2 
##                  ROA.A..before.interest.and...after.tax 
##                                                       2 
##                                    Total.expense.Assets 
##                                                       2 
##             Non.industry.income.and.expenditure.revenue 
##                                                       2 
##                                      Revenue.per.person 
##                                                       2 
##      Interest.Coverage.Ratio..Interest.expense.to.EBIT. 
##                                                       2 
##                                             Quick.Ratio 
##                                                       2 
##                                   Operating.Profit.Rate 
##                                                       2 
##                              Revenue.Per.Share..Yuan... 
##                                                       2 
##                          Quick.Assets.Current.Liability 
##                                                       1 
##                    Total.Asset.Return.Growth.Rate.Ratio 
##                                                       1 
##                                    Total.Asset.Turnover 
##                                                       1 
##                         Working.Capital.to.Total.Assets 
##                                                       1 
##       ROA.B..before.interest.and.depreciation.after.tax 
##                                                       1 
##                     Current.Liability.to.Current.Assets 
##                                                       1 
##                                           Current.Ratio 
##                                                       1 
##                          Working.capitcal.Turnover.Rate 
##                                                       1 
##                                     Liability.to.Equity 
##                                                       1 
##                                            Debt.ratio.. 
##                                                       1 
##                                     Equity.to.Liability 
##                                                       1 
##                                        Net.worth.Assets 
##                                                       1 
##                              Total.debt.Total.net.worth 
##                                                       1 
##                             Current.Liability.to.Assets 
##                                                       1 
##                           Current.Liabilities.Liability 
##                                                       1 
##                   Long.term.Liability.to.Current.Assets 
##                                                       1 
##                              Total.income.Total.expense 
##                                                       1 
## ROA.C..before.interest.and.depreciation.before.interest 
##                                                       1 
##                             Operating.profit.per.person 
##                                                       1 
##                        Operating.profit.Paid.in.capital 
##                                                       1 
##                     Operating.Profit.Per.Share..Yuan... 
##                                                       1 
##                                  Operating.Gross.Margin 
##                                                       1 
##                           Equity.to.Long.term.Liability 
##                                                       1 
##                                  Cash.Current.Liability 
##                                                       1 
##                          Current.Liability.to.Liability 
##                                                       1 
##                         Net.Worth.Turnover.Rate..times. 
##                                                       1 
##                              Current.Liabilities.Equity 
##                                                       1 
##                             Current.Liability.to.Equity 
##                                                       1 
## 
## Node number 1: 4773 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.03226482  P(node) =1
##     class counts:  4619   154
##    probabilities: 0.968 0.032 
##   left son=2 (4637 obs) right son=3 (136 obs)
##   Primary splits:
##       Net.Value.Growth.Rate                    < 0.0003670415 to the right, improve=41.90007, (0 missing)
##       Net.Income.to.Stockholder.s.Equity       < 0.8344808    to the right, improve=38.84998, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1807696    to the right, improve=38.56514, (0 missing)
##       Net.profit.before.tax.Paid.in.capital    < 0.1406304    to the right, improve=36.02657, (0 missing)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1486782    to the right, improve=33.70200, (0 missing)
##   Surrogate splits:
##       Net.Income.to.Stockholder.s.Equity       < 0.8334933    to the right, agree=0.992, adj=0.706, (0 split)
##       Net.profit.before.tax.Paid.in.capital    < 0.1385529    to the right, agree=0.985, adj=0.463, (0 split)
##       Persistent.EPS.in.the.Last.Four.Seasons  < 0.1769405    to the right, agree=0.984, adj=0.449, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1361       to the right, agree=0.983, adj=0.404, (0 split)
##       Net.Income.to.Total.Assets               < 0.7094973    to the right, agree=0.981, adj=0.331, (0 split)
## 
## Node number 2: 4637 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.0209187  P(node) =0.9715064
##     class counts:  4540    97
##    probabilities: 0.979 0.021 
##   left son=4 (4359 obs) right son=5 (278 obs)
##   Primary splits:
##       Borrowing.dependency                    < 0.3826258    to the left,  improve=11.158660, (0 missing)
##       Working.Capital.Equity                  < 0.7274442    to the right, improve= 9.783354, (0 missing)
##       Degree.of.Financial.Leverage..DFL.      < 0.02669723   to the right, improve= 9.749996, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.2012385    to the right, improve= 8.927490, (0 missing)
##       Interest.Expense.Ratio                  < 0.6303504    to the right, improve= 8.555154, (0 missing)
##   Surrogate splits:
##       Liability.to.Equity        < 0.2869906    to the left,  agree=0.960, adj=0.338, (0 split)
##       Debt.ratio..               < 0.1922484    to the left,  agree=0.960, adj=0.335, (0 split)
##       Net.worth.Assets           < 0.8077516    to the right, agree=0.960, adj=0.335, (0 split)
##       Equity.to.Liability        < 0.01809593   to the right, agree=0.960, adj=0.335, (0 split)
##       Total.debt.Total.net.worth < 0.01683811   to the left,  agree=0.959, adj=0.317, (0 split)
## 
## Node number 3: 136 observations,    complexity param=0.04329004
##   predicted class=0  expected loss=0.4191176  P(node) =0.02849361
##     class counts:    79    57
##    probabilities: 0.581 0.419 
##   left son=6 (38 obs) right son=7 (98 obs)
##   Primary splits:
##       Quick.Ratio                           < 0.005814193  to the right, improve=8.720051, (0 missing)
##       Quick.Assets.Current.Liability        < 0.003547663  to the right, improve=8.070588, (0 missing)
##       Net.profit.before.tax.Paid.in.capital < 0.1066247    to the right, improve=7.902999, (0 missing)
##       Working.capitcal.Turnover.Rate        < 0.5939439    to the right, improve=7.695608, (0 missing)
##       Cash.Total.Assets                     < 0.00943074   to the right, improve=7.537255, (0 missing)
##   Surrogate splits:
##       Quick.Assets.Current.Liability      < 0.005713739  to the right, agree=0.941, adj=0.789, (0 split)
##       Current.Ratio                       < 0.009712606  to the right, agree=0.890, adj=0.605, (0 split)
##       Working.capitcal.Turnover.Rate      < 0.5939439    to the right, agree=0.890, adj=0.605, (0 split)
##       Current.Liability.to.Current.Assets < 0.0300781    to the left,  agree=0.890, adj=0.605, (0 split)
##       Working.Capital.to.Total.Assets     < 0.7725806    to the right, agree=0.860, adj=0.500, (0 split)
## 
## Node number 4: 4359 observations,    complexity param=0.01082251
##   predicted class=0  expected loss=0.01215875  P(node) =0.9132621
##     class counts:  4306    53
##    probabilities: 0.988 0.012 
##   left son=8 (4100 obs) right son=9 (259 obs)
##   Primary splits:
##       Persistent.EPS.in.the.Last.Four.Seasons            < 0.1997258    to the right, improve=3.569293, (0 missing)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02671931   to the right, improve=3.116037, (0 missing)
##       Total.income.Total.expense                         < 0.002060573  to the right, improve=3.069896, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5647908    to the right, improve=3.023697, (0 missing)
##       Interest.Expense.Ratio                             < 0.6303504    to the right, improve=2.964888, (0 missing)
##   Surrogate splits:
##       Net.profit.before.tax.Paid.in.capital             < 0.1565116    to the right, agree=0.980, adj=0.664, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan...          < 0.1575281    to the right, agree=0.978, adj=0.633, (0 split)
##       Net.Income.to.Stockholder.s.Equity                < 0.8378972    to the right, agree=0.973, adj=0.548, (0 split)
##       ROA.B..before.interest.and.depreciation.after.tax < 0.4829755    to the right, agree=0.967, adj=0.444, (0 split)
##       Net.Income.to.Total.Assets                        < 0.7680602    to the right, agree=0.965, adj=0.413, (0 split)
## 
## Node number 5: 278 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.1582734  P(node) =0.05824429
##     class counts:   234    44
##    probabilities: 0.842 0.158 
##   left son=10 (197 obs) right son=11 (81 obs)
##   Primary splits:
##       Non.industry.income.and.expenditure.revenue        < 0.303409     to the right, improve=9.121639, (0 missing)
##       Continuous.interest.rate..after.tax.               < 0.7815287    to the right, improve=8.355707, (0 missing)
##       Per.Share.Net.profit.before.tax..Yuan...           < 0.1720268    to the right, improve=8.280004, (0 missing)
##       After.tax.net.Interest.Rate                        < 0.8092315    to the right, improve=8.244631, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5606346    to the right, improve=8.196150, (0 missing)
##   Surrogate splits:
##       After.tax.net.Interest.Rate              < 0.8092007    to the right, agree=0.831, adj=0.420, (0 split)
##       Pre.tax.net.Interest.Rate                < 0.7973282    to the right, agree=0.827, adj=0.407, (0 split)
##       Continuous.interest.rate..after.tax.     < 0.7814471    to the right, agree=0.817, adj=0.370, (0 split)
##       Total.income.Total.expense               < 0.002123953  to the right, agree=0.802, adj=0.321, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1628756    to the right, agree=0.791, adj=0.284, (0 split)
## 
## Node number 6: 38 observations
##   predicted class=0  expected loss=0.1315789  P(node) =0.00796145
##     class counts:    33     5
##    probabilities: 0.868 0.132 
## 
## Node number 7: 98 observations,    complexity param=0.04329004
##   predicted class=1  expected loss=0.4693878  P(node) =0.02053216
##     class counts:    46    52
##    probabilities: 0.469 0.531 
##   left son=14 (22 obs) right son=15 (76 obs)
##   Primary splits:
##       Revenue.per.person                      < 0.00693859   to the left,  improve=6.902451, (0 missing)
##       Operating.profit.per.person             < 0.371379     to the right, improve=6.275150, (0 missing)
##       Realized.Sales.Gross.Profit.Growth.Rate < 0.02222639   to the right, improve=5.580371, (0 missing)
##       Net.Value.Growth.Rate                   < 0.0002560444 to the right, improve=5.173469, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.1805332    to the right, improve=5.153856, (0 missing)
##   Surrogate splits:
##       Operating.Profit.Rate                       < 0.9982432    to the left,  agree=0.837, adj=0.273, (0 split)
##       Revenue.Per.Share..Yuan...                  < 0.006692681  to the left,  agree=0.816, adj=0.182, (0 split)
##       Non.industry.income.and.expenditure.revenue < 0.303533     to the right, agree=0.806, adj=0.136, (0 split)
##       Continuous.interest.rate..after.tax.        < 0.7800147    to the left,  agree=0.806, adj=0.136, (0 split)
##       Accounts.Receivable.Turnover                < 0.0004027472 to the left,  agree=0.806, adj=0.136, (0 split)
## 
## Node number 8: 4100 observations
##   predicted class=0  expected loss=0.007073171  P(node) =0.8589985
##     class counts:  4071    29
##    probabilities: 0.993 0.007 
## 
## Node number 9: 259 observations,    complexity param=0.01082251
##   predicted class=0  expected loss=0.09266409  P(node) =0.05426357
##     class counts:   235    24
##    probabilities: 0.907 0.093 
##   left son=18 (222 obs) right son=19 (37 obs)
##   Primary splits:
##       Interest.Expense.Ratio                             < 0.6298537    to the right, improve=5.777349, (0 missing)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02660342   to the right, improve=4.434238, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5640698    to the right, improve=4.434238, (0 missing)
##       Cash.Total.Assets                                  < 0.01088273   to the right, improve=4.153023, (0 missing)
##       Interest.bearing.debt.interest.rate                < 0.0004480448 to the left,  improve=4.049067, (0 missing)
##   Surrogate splits:
##       Degree.of.Financial.Leverage..DFL.                 < 0.02660342   to the right, agree=0.981, adj=0.865, (0 split)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5640698    to the right, agree=0.981, adj=0.865, (0 split)
##       ROA.A..before.interest.and...after.tax             < 0.497138     to the left,  agree=0.892, adj=0.243, (0 split)
##       Cash.Current.Liability                             < 0.000247404  to the right, agree=0.876, adj=0.135, (0 split)
##       Net.Income.to.Total.Assets                         < 0.7717563    to the left,  agree=0.876, adj=0.135, (0 split)
## 
## Node number 10: 197 observations
##   predicted class=0  expected loss=0.07614213  P(node) =0.04127383
##     class counts:   182    15
##    probabilities: 0.924 0.076 
## 
## Node number 11: 81 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.3580247  P(node) =0.01697046
##     class counts:    52    29
##    probabilities: 0.642 0.358 
##   left son=22 (51 obs) right son=23 (30 obs)
##   Primary splits:
##       Interest.bearing.debt.interest.rate                < 0.0006460646 to the left,  improve=7.222803, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5620761    to the right, improve=6.365003, (0 missing)
##       Interest.Expense.Ratio                             < 0.6280977    to the right, improve=5.206994, (0 missing)
##       Operating.Expense.Rate                             < 4.585e+09    to the left,  improve=4.679868, (0 missing)
##       Revenue.Per.Share..Yuan...                         < 0.04138875   to the left,  improve=4.457103, (0 missing)
##   Surrogate splits:
##       Degree.of.Financial.Leverage..DFL.                 < 0.02643784   to the right, agree=0.840, adj=0.567, (0 split)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5626205    to the right, agree=0.778, adj=0.400, (0 split)
##       Interest.Expense.Ratio                             < 0.6280977    to the right, agree=0.741, adj=0.300, (0 split)
##       Cash.Current.Liability                             < 0.0002664469 to the right, agree=0.741, adj=0.300, (0 split)
##       Cash.Total.Assets                                  < 0.01247115   to the right, agree=0.728, adj=0.267, (0 split)
## 
## Node number 14: 22 observations
##   predicted class=0  expected loss=0.1818182  P(node) =0.00460926
##     class counts:    18     4
##    probabilities: 0.818 0.182 
## 
## Node number 15: 76 observations,    complexity param=0.04329004
##   predicted class=1  expected loss=0.3684211  P(node) =0.0159229
##     class counts:    28    48
##    probabilities: 0.368 0.632 
##   left son=30 (8 obs) right son=31 (68 obs)
##   Primary splits:
##       Continuous.interest.rate..after.tax.    < 0.7814384    to the right, improve=7.133127, (0 missing)
##       Fixed.Assets.Turnover.Frequency         < 0.001012572  to the right, improve=5.035320, (0 missing)
##       Total.Asset.Return.Growth.Rate.Ratio    < 0.2615727    to the right, improve=4.658744, (0 missing)
##       Net.Value.Growth.Rate                   < 0.0002560444 to the right, improve=4.257310, (0 missing)
##       Persistent.EPS.in.the.Last.Four.Seasons < 0.1793042    to the right, improve=3.995872, (0 missing)
##   Surrogate splits:
##       Pre.tax.net.Interest.Rate       < 0.7972556    to the right, agree=0.974, adj=0.750, (0 split)
##       After.tax.net.Interest.Rate     < 0.8091983    to the right, agree=0.974, adj=0.750, (0 split)
##       Revenue.Per.Share..Yuan...      < 0.08244476   to the right, agree=0.947, adj=0.500, (0 split)
##       Total.Asset.Turnover            < 0.3718141    to the right, agree=0.947, adj=0.500, (0 split)
##       Net.Worth.Turnover.Rate..times. < 0.1894355    to the right, agree=0.934, adj=0.375, (0 split)
## 
## Node number 18: 222 observations
##   predicted class=0  expected loss=0.04954955  P(node) =0.04651163
##     class counts:   211    11
##    probabilities: 0.950 0.050 
## 
## Node number 19: 37 observations,    complexity param=0.01082251
##   predicted class=0  expected loss=0.3513514  P(node) =0.007751938
##     class counts:    24    13
##    probabilities: 0.649 0.351 
##   left son=38 (28 obs) right son=39 (9 obs)
##   Primary splits:
##       Interest.bearing.debt.interest.rate  < 0.0007065707 to the left,  improve=4.325182, (0 missing)
##       Fixed.Assets.Turnover.Frequency      < 4.235e+09    to the left,  improve=3.531532, (0 missing)
##       Continuous.interest.rate..after.tax. < 0.781343     to the right, improve=3.435453, (0 missing)
##       Inventory.Current.Liability          < 0.004639869  to the right, improve=3.435453, (0 missing)
##       Cash.Total.Assets                    < 0.01279077   to the right, improve=3.331532, (0 missing)
##   Surrogate splits:
##       Total.debt.Total.net.worth < 0.01745365   to the left,  agree=0.838, adj=0.333, (0 split)
##       Debt.ratio..               < 0.1946216    to the left,  agree=0.838, adj=0.333, (0 split)
##       Net.worth.Assets           < 0.8053784    to the right, agree=0.838, adj=0.333, (0 split)
##       Liability.to.Equity        < 0.2874368    to the left,  agree=0.838, adj=0.333, (0 split)
##       Equity.to.Liability        < 0.01783194   to the right, agree=0.838, adj=0.333, (0 split)
## 
## Node number 22: 51 observations,    complexity param=0.01623377
##   predicted class=0  expected loss=0.1960784  P(node) =0.0106851
##     class counts:    41    10
##    probabilities: 0.804 0.196 
##   left son=44 (24 obs) right son=45 (27 obs)
##   Primary splits:
##       Operating.profit.per.person         < 0.3922684    to the right, improve=3.485839, (0 missing)
##       Net.Income.to.Total.Assets          < 0.7711328    to the right, improve=3.431967, (0 missing)
##       Operating.Profit.Rate               < 0.9989727    to the right, improve=3.221289, (0 missing)
##       Operating.Profit.Per.Share..Yuan... < 0.09620552   to the right, improve=2.974983, (0 missing)
##       Operating.profit.Paid.in.capital    < 0.09617796   to the right, improve=2.974983, (0 missing)
##   Surrogate splits:
##       Operating.Profit.Rate               < 0.998964     to the right, agree=0.980, adj=0.958, (0 split)
##       Operating.Profit.Per.Share..Yuan... < 0.09543197   to the right, agree=0.980, adj=0.958, (0 split)
##       Operating.profit.Paid.in.capital    < 0.09544514   to the right, agree=0.980, adj=0.958, (0 split)
##       Operating.Gross.Margin              < 0.6010392    to the right, agree=0.843, adj=0.667, (0 split)
##       Realized.Sales.Gross.Margin         < 0.6010825    to the right, agree=0.843, adj=0.667, (0 split)
## 
## Node number 23: 30 observations,    complexity param=0.01731602
##   predicted class=1  expected loss=0.3666667  P(node) =0.006285355
##     class counts:    11    19
##    probabilities: 0.367 0.633 
##   left son=46 (20 obs) right son=47 (10 obs)
##   Primary splits:
##       Long.term.Liability.to.Current.Assets                   < 0.004452273  to the right, improve=4.033333, (0 missing)
##       ROA.C..before.interest.and.depreciation.before.interest < 0.482377     to the right, improve=3.600000, (0 missing)
##       Revenue.per.person                                      < 0.04380682   to the left,  improve=3.457143, (0 missing)
##       Allocation.rate.per.person                              < 0.05656835   to the left,  improve=2.933333, (0 missing)
##       ROA.B..before.interest.and.depreciation.after.tax       < 0.531854     to the right, improve=2.838311, (0 missing)
##   Surrogate splits:
##       Equity.to.Long.term.Liability  < 0.1208088    to the right, agree=0.933, adj=0.8, (0 split)
##       Revenue.per.person             < 0.04817649   to the left,  agree=0.900, adj=0.7, (0 split)
##       Current.Liability.to.Assets    < 0.1569419    to the left,  agree=0.900, adj=0.7, (0 split)
##       Current.Liabilities.Liability  < 0.7965408    to the left,  agree=0.900, adj=0.7, (0 split)
##       Current.Liability.to.Liability < 0.7965408    to the left,  agree=0.900, adj=0.7, (0 split)
## 
## Node number 30: 8 observations
##   predicted class=0  expected loss=0  P(node) =0.001676095
##     class counts:     8     0
##    probabilities: 1.000 0.000 
## 
## Node number 31: 68 observations,    complexity param=0.02597403
##   predicted class=1  expected loss=0.2941176  P(node) =0.0142468
##     class counts:    20    48
##    probabilities: 0.294 0.706 
##   left son=62 (42 obs) right son=63 (26 obs)
##   Primary splits:
##       Interest.bearing.debt.interest.rate                < 0.0004680468 to the left,  improve=3.971558, (0 missing)
##       Total.Asset.Return.Growth.Rate.Ratio               < 0.2615727    to the right, improve=3.050109, (0 missing)
##       Interest.Expense.Ratio                             < 0.6302102    to the right, improve=2.999522, (0 missing)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02668064   to the right, improve=2.999522, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5645698    to the right, improve=2.999522, (0 missing)
##   Surrogate splits:
##       Interest.Expense.Ratio                             < 0.6301963    to the right, agree=0.765, adj=0.385, (0 split)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02667625   to the right, agree=0.765, adj=0.385, (0 split)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.5645436    to the right, agree=0.765, adj=0.385, (0 split)
##       Total.income.Total.expense                         < 0.00203894   to the left,  agree=0.721, adj=0.269, (0 split)
##       Operating.Gross.Margin                             < 0.6021058    to the left,  agree=0.706, adj=0.231, (0 split)
## 
## Node number 38: 28 observations
##   predicted class=0  expected loss=0.2142857  P(node) =0.005866331
##     class counts:    22     6
##    probabilities: 0.786 0.214 
## 
## Node number 39: 9 observations
##   predicted class=1  expected loss=0.2222222  P(node) =0.001885607
##     class counts:     2     7
##    probabilities: 0.222 0.778 
## 
## Node number 44: 24 observations
##   predicted class=0  expected loss=0  P(node) =0.005028284
##     class counts:    24     0
##    probabilities: 1.000 0.000 
## 
## Node number 45: 27 observations,    complexity param=0.01623377
##   predicted class=0  expected loss=0.3703704  P(node) =0.00565682
##     class counts:    17    10
##    probabilities: 0.630 0.370 
##   left son=90 (16 obs) right son=91 (11 obs)
##   Primary splits:
##       After.tax.net.Interest.Rate           < 0.8090741    to the left,  improve=4.728956, (0 missing)
##       Continuous.interest.rate..after.tax.  < 0.7813489    to the left,  improve=4.728956, (0 missing)
##       Revenue.Per.Share..Yuan...            < 0.02714884   to the left,  improve=4.478307, (0 missing)
##       Research.and.development.expense.rate < 3.7e+08      to the left,  improve=3.792593, (0 missing)
##       Pre.tax.net.Interest.Rate             < 0.7971327    to the left,  improve=3.451416, (0 missing)
##   Surrogate splits:
##       Pre.tax.net.Interest.Rate            < 0.7971327    to the left,  agree=0.963, adj=0.909, (0 split)
##       Continuous.interest.rate..after.tax. < 0.7813183    to the left,  agree=0.926, adj=0.818, (0 split)
##       Operating.Profit.Rate                < 0.99886      to the left,  agree=0.852, adj=0.636, (0 split)
##       Revenue.Per.Share..Yuan...           < 0.0214317    to the left,  agree=0.852, adj=0.636, (0 split)
##       Total.Asset.Turnover                 < 0.05172414   to the left,  agree=0.815, adj=0.545, (0 split)
## 
## Node number 46: 20 observations,    complexity param=0.01731602
##   predicted class=0  expected loss=0.45  P(node) =0.004190237
##     class counts:    11     9
##    probabilities: 0.550 0.450 
##   left son=92 (10 obs) right son=93 (10 obs)
##   Primary splits:
##       Current.Liabilities.Equity                  < 0.3359431    to the right, improve=2.500000, (0 missing)
##       Current.Liability.to.Equity                 < 0.3359431    to the right, improve=2.500000, (0 missing)
##       Operating.Gross.Margin                      < 0.6057849    to the right, improve=2.031868, (0 missing)
##       Realized.Sales.Gross.Margin                 < 0.6057849    to the right, improve=2.031868, (0 missing)
##       Inventory.and.accounts.receivable.Net.value < 0.4024762    to the right, improve=2.031868, (0 missing)
##   Surrogate splits:
##       Current.Liability.to.Equity          < 0.3359431    to the right, agree=1.0, adj=1.0, (0 split)
##       Current.Liability.to.Assets          < 0.1203266    to the right, agree=0.9, adj=0.8, (0 split)
##       Long.term.fund.suitability.ratio..A. < 0.004984772  to the left,  agree=0.8, adj=0.6, (0 split)
##       Working.Capital.to.Total.Assets      < 0.7096198    to the left,  agree=0.8, adj=0.6, (0 split)
##       Current.Liabilities.Liability        < 0.452111     to the right, agree=0.8, adj=0.6, (0 split)
## 
## Node number 47: 10 observations
##   predicted class=1  expected loss=0  P(node) =0.002095118
##     class counts:     0    10
##    probabilities: 0.000 1.000 
## 
## Node number 62: 42 observations,    complexity param=0.02597403
##   predicted class=1  expected loss=0.4285714  P(node) =0.008799497
##     class counts:    18    24
##    probabilities: 0.429 0.571 
##   left son=124 (30 obs) right son=125 (12 obs)
##   Primary splits:
##       Total.Asset.Return.Growth.Rate.Ratio < 0.2615727    to the right, improve=6.171429, (0 missing)
##       Net.Value.Growth.Rate                < 0.0002560444 to the right, improve=5.474654, (0 missing)
##       Continuous.interest.rate..after.tax. < 0.7809941    to the right, improve=5.084014, (0 missing)
##       Pre.tax.net.Interest.Rate            < 0.7966958    to the right, improve=4.763736, (0 missing)
##       After.tax.net.Interest.Rate          < 0.808647     to the right, improve=4.763736, (0 missing)
##   Surrogate splits:
##       ROA.A..before.interest.and...after.tax   < 0.3072122    to the right, agree=0.929, adj=0.750, (0 split)
##       Total.expense.Assets                     < 0.1203253    to the left,  agree=0.929, adj=0.750, (0 split)
##       Net.Income.to.Total.Assets               < 0.6176428    to the right, agree=0.929, adj=0.750, (0 split)
##       Per.Share.Net.profit.before.tax..Yuan... < 0.1038638    to the right, agree=0.905, adj=0.667, (0 split)
##       Net.profit.before.tax.Paid.in.capital    < 0.1057386    to the right, agree=0.905, adj=0.667, (0 split)
## 
## Node number 63: 26 observations
##   predicted class=1  expected loss=0.07692308  P(node) =0.005447308
##     class counts:     2    24
##    probabilities: 0.077 0.923 
## 
## Node number 90: 16 observations
##   predicted class=0  expected loss=0.125  P(node) =0.003352189
##     class counts:    14     2
##    probabilities: 0.875 0.125 
## 
## Node number 91: 11 observations
##   predicted class=1  expected loss=0.2727273  P(node) =0.00230463
##     class counts:     3     8
##    probabilities: 0.273 0.727 
## 
## Node number 92: 10 observations
##   predicted class=0  expected loss=0.2  P(node) =0.002095118
##     class counts:     8     2
##    probabilities: 0.800 0.200 
## 
## Node number 93: 10 observations
##   predicted class=1  expected loss=0.3  P(node) =0.002095118
##     class counts:     3     7
##    probabilities: 0.300 0.700 
## 
## Node number 124: 30 observations,    complexity param=0.02597403
##   predicted class=0  expected loss=0.4  P(node) =0.006285355
##     class counts:    18    12
##    probabilities: 0.600 0.400 
##   left son=248 (14 obs) right son=249 (16 obs)
##   Primary splits:
##       Total.expense.Assets                               < 0.04985467   to the right, improve=5.667857, (0 missing)
##       Total.Asset.Turnover                               < 0.07571214   to the right, improve=5.185714, (0 missing)
##       Interest.Expense.Ratio                             < 0.6303859    to the right, improve=4.789140, (0 missing)
##       Degree.of.Financial.Leverage..DFL.                 < 0.02672578   to the right, improve=4.789140, (0 missing)
##       Interest.Coverage.Ratio..Interest.expense.to.EBIT. < 0.564826     to the right, improve=4.789140, (0 missing)
##   Surrogate splits:
##       ROA.A..before.interest.and...after.tax                  < 0.4413705    to the left,  agree=0.900, adj=0.786, (0 split)
##       ROA.B..before.interest.and.depreciation.after.tax       < 0.4453665    to the left,  agree=0.867, adj=0.714, (0 split)
##       Net.Income.to.Total.Assets                              < 0.728307     to the left,  agree=0.867, adj=0.714, (0 split)
##       ROA.C..before.interest.and.depreciation.before.interest < 0.3833423    to the left,  agree=0.833, adj=0.643, (0 split)
##       Interest.Expense.Ratio                                  < 0.6302323    to the right, agree=0.833, adj=0.643, (0 split)
## 
## Node number 125: 12 observations
##   predicted class=1  expected loss=0  P(node) =0.002514142
##     class counts:     0    12
##    probabilities: 0.000 1.000 
## 
## Node number 248: 14 observations
##   predicted class=0  expected loss=0.07142857  P(node) =0.002933166
##     class counts:    13     1
##    probabilities: 0.929 0.071 
## 
## Node number 249: 16 observations
##   predicted class=1  expected loss=0.3125  P(node) =0.003352189
##     class counts:     5    11
##    probabilities: 0.313 0.688
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$Bankrupt.))
y1[y[,1]>0.5] <- 1
y1[-(y[,1]>0.5)] <- 0
table(as.factor(y1),dataTE$Bankrupt.)
##    
##        0    1
##   0 1980   66
length(y1)
## [1] 2046

Knn for k=3,5,7,9,11, k=7 performs best, however 60 of the non-bankrupcy cases were predicted as bankrupcy and 3 of the bankrupcy cases were predicted as non-bankrupcy. Meaning there are 60 false positives and 3 false negatives. Tree structure failed to capture the bankrupting companies, most likely because of the fact that there was a heavy class imbalance.

For Randomforest, I have tried m=3,4,5,6,7 parameters.

rf.bankrupcy=randomForest(Bankrupt.~.,data=dataTR,mtry=6,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.bankrupcy)

varImpPlot(rf.bankrupcy)

pred.bankrupcy = predict(rf.bankrupcy,newdata=dataTE)
table(pred.bankrupcy,dataTE[[1]])
##               
## pred.bankrupcy    0    1
##              0 1978   59
##              1    2    7
rf.bankrupcy=randomForest(Bankrupt.~.,data=dataTR,mtry=7,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.bankrupcy)

varImpPlot(rf.bankrupcy)

pred.bankrupcy = predict(rf.bankrupcy,newdata=dataTE)
table(pred.bankrupcy,dataTE[[1]])
##               
## pred.bankrupcy    0    1
##              0 1978   58
##              1    2    8
rf.bankrupcy=randomForest(Bankrupt.~.,data=dataTR,mtry=4,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.bankrupcy)

varImpPlot(rf.bankrupcy)

pred.bankrupcy = predict(rf.bankrupcy,newdata=dataTE)
table(pred.bankrupcy,dataTE[[1]])
##               
## pred.bankrupcy    0    1
##              0 1978   58
##              1    2    8
rf.bankrupcy=randomForest(Bankrupt.~.,data=dataTR,mtry=3,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.bankrupcy)

varImpPlot(rf.bankrupcy)

pred.bankrupcy = predict(rf.bankrupcy,newdata=dataTE)
table(pred.bankrupcy,dataTE[[1]])
##               
## pred.bankrupcy    0    1
##              0 1979   59
##              1    1    7
rf.bankrupcy=randomForest(Bankrupt.~.,data=dataTR,mtry=5,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.bankrupcy)

varImpPlot(rf.bankrupcy)

pred.bankrupcy = predict(rf.bankrupcy,newdata=dataTE)
table(pred.bankrupcy,dataTE$Bankrupt.)
##               
## pred.bankrupcy    0    1
##              0 1977   58
##              1    3    8

mtry=5 worked best for our example, and produced 58 false positives and 1 false negative.

noftrees=50
depth=5
learning_rate=0.2
sampling_fraction=0.5


#boosting_model=gbm(Bankrupt.~.,distribution="bernoulli", data=dataTR, n.trees = noftrees,interaction.depth = depth,cv.folds=10,class.stratify.cv=TRUE, 
                   #n.minobsinnode = 5, shrinkage =learning_rate,
                   #bag.fraction = sampling_fraction)
#boosting_model
#summary(boosting_model)

#pred.bankrupcy = predict.gbm(boosting_model,newdata=dataTE,type="response",single.tree=FALSE)
#pred.bankrupcy = predict(boosting_model,newdata=dataTE,type="response",single.tree=FALSE)

I believe there is a problem with the new version of the gbm package. It aborts when I try to call it. It sometimes works sometimes does not.

##Breast Cancer dataset This data set is concerned with predicting whether there is a malignant or benign tumor on the cancer cells residing along breast area.

data <- data.table(read.csv("breastc.csv",stringsAsFactors=T))
str(data)
## Classes 'data.table' and 'data.frame':   569 obs. of  33 variables:
##  $ id                     : int  842302 842517 84300903 84348301 84358402 843786 844359 84458202 844981 84501001 ...
##  $ diagnosis              : Factor w/ 2 levels "B","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ radius_mean            : num  18 20.6 19.7 11.4 20.3 ...
##  $ texture_mean           : num  10.4 17.8 21.2 20.4 14.3 ...
##  $ perimeter_mean         : num  122.8 132.9 130 77.6 135.1 ...
##  $ area_mean              : num  1001 1326 1203 386 1297 ...
##  $ smoothness_mean        : num  0.1184 0.0847 0.1096 0.1425 0.1003 ...
##  $ compactness_mean       : num  0.2776 0.0786 0.1599 0.2839 0.1328 ...
##  $ concavity_mean         : num  0.3001 0.0869 0.1974 0.2414 0.198 ...
##  $ concave.points_mean    : num  0.1471 0.0702 0.1279 0.1052 0.1043 ...
##  $ symmetry_mean          : num  0.242 0.181 0.207 0.26 0.181 ...
##  $ fractal_dimension_mean : num  0.0787 0.0567 0.06 0.0974 0.0588 ...
##  $ radius_se              : num  1.095 0.543 0.746 0.496 0.757 ...
##  $ texture_se             : num  0.905 0.734 0.787 1.156 0.781 ...
##  $ perimeter_se           : num  8.59 3.4 4.58 3.44 5.44 ...
##  $ area_se                : num  153.4 74.1 94 27.2 94.4 ...
##  $ smoothness_se          : num  0.0064 0.00522 0.00615 0.00911 0.01149 ...
##  $ compactness_se         : num  0.049 0.0131 0.0401 0.0746 0.0246 ...
##  $ concavity_se           : num  0.0537 0.0186 0.0383 0.0566 0.0569 ...
##  $ concave.points_se      : num  0.0159 0.0134 0.0206 0.0187 0.0188 ...
##  $ symmetry_se            : num  0.03 0.0139 0.0225 0.0596 0.0176 ...
##  $ fractal_dimension_se   : num  0.00619 0.00353 0.00457 0.00921 0.00511 ...
##  $ radius_worst           : num  25.4 25 23.6 14.9 22.5 ...
##  $ texture_worst          : num  17.3 23.4 25.5 26.5 16.7 ...
##  $ perimeter_worst        : num  184.6 158.8 152.5 98.9 152.2 ...
##  $ area_worst             : num  2019 1956 1709 568 1575 ...
##  $ smoothness_worst       : num  0.162 0.124 0.144 0.21 0.137 ...
##  $ compactness_worst      : num  0.666 0.187 0.424 0.866 0.205 ...
##  $ concavity_worst        : num  0.712 0.242 0.45 0.687 0.4 ...
##  $ concave.points_worst   : num  0.265 0.186 0.243 0.258 0.163 ...
##  $ symmetry_worst         : num  0.46 0.275 0.361 0.664 0.236 ...
##  $ fractal_dimension_worst: num  0.1189 0.089 0.0876 0.173 0.0768 ...
##  $ X                      : logi  NA NA NA NA NA NA ...
##  - attr(*, ".internal.selfref")=<externalptr>
data <- data[,-c(1,33)]
set.seed(582) #using caTools performing stratified sampling
split=sample.split(data$diagnosis, SplitRatio=0.7)
dataTR=subset(data,split==TRUE)
dataTE=subset(data,split==FALSE)
str(dataTR)
## Classes 'data.table' and 'data.frame':   398 obs. of  31 variables:
##  $ diagnosis              : Factor w/ 2 levels "B","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ radius_mean            : num  20.6 19.7 11.4 20.3 12.4 ...
##  $ texture_mean           : num  17.8 21.2 20.4 14.3 15.7 ...
##  $ perimeter_mean         : num  132.9 130 77.6 135.1 82.6 ...
##  $ area_mean              : num  1326 1203 386 1297 477 ...
##  $ smoothness_mean        : num  0.0847 0.1096 0.1425 0.1003 0.1278 ...
##  $ compactness_mean       : num  0.0786 0.1599 0.2839 0.1328 0.17 ...
##  $ concavity_mean         : num  0.0869 0.1974 0.2414 0.198 0.1578 ...
##  $ concave.points_mean    : num  0.0702 0.1279 0.1052 0.1043 0.0809 ...
##  $ symmetry_mean          : num  0.181 0.207 0.26 0.181 0.209 ...
##  $ fractal_dimension_mean : num  0.0567 0.06 0.0974 0.0588 0.0761 ...
##  $ radius_se              : num  0.543 0.746 0.496 0.757 0.335 ...
##  $ texture_se             : num  0.734 0.787 1.156 0.781 0.89 ...
##  $ perimeter_se           : num  3.4 4.58 3.44 5.44 2.22 ...
##  $ area_se                : num  74.1 94 27.2 94.4 27.2 ...
##  $ smoothness_se          : num  0.00522 0.00615 0.00911 0.01149 0.00751 ...
##  $ compactness_se         : num  0.0131 0.0401 0.0746 0.0246 0.0335 ...
##  $ concavity_se           : num  0.0186 0.0383 0.0566 0.0569 0.0367 ...
##  $ concave.points_se      : num  0.0134 0.0206 0.0187 0.0188 0.0114 ...
##  $ symmetry_se            : num  0.0139 0.0225 0.0596 0.0176 0.0216 ...
##  $ fractal_dimension_se   : num  0.00353 0.00457 0.00921 0.00511 0.00508 ...
##  $ radius_worst           : num  25 23.6 14.9 22.5 15.5 ...
##  $ texture_worst          : num  23.4 25.5 26.5 16.7 23.8 ...
##  $ perimeter_worst        : num  158.8 152.5 98.9 152.2 103.4 ...
##  $ area_worst             : num  1956 1709 568 1575 742 ...
##  $ smoothness_worst       : num  0.124 0.144 0.21 0.137 0.179 ...
##  $ compactness_worst      : num  0.187 0.424 0.866 0.205 0.525 ...
##  $ concavity_worst        : num  0.242 0.45 0.687 0.4 0.535 ...
##  $ concave.points_worst   : num  0.186 0.243 0.258 0.163 0.174 ...
##  $ symmetry_worst         : num  0.275 0.361 0.664 0.236 0.399 ...
##  $ fractal_dimension_worst: num  0.089 0.0876 0.173 0.0768 0.1244 ...
##  - attr(*, ".internal.selfref")=<externalptr>
str(dataTE)
## Classes 'data.table' and 'data.frame':   171 obs. of  31 variables:
##  $ diagnosis              : Factor w/ 2 levels "B","M": 2 2 2 2 2 2 2 1 1 2 ...
##  $ radius_mean            : num  18 12.5 15.8 13.7 14.5 ...
##  $ texture_mean           : num  10.4 24 23.9 22.6 27.5 ...
##  $ perimeter_mean         : num  122.8 84 103.7 93.6 96.7 ...
##  $ area_mean              : num  1001 476 783 578 659 ...
##  $ smoothness_mean        : num  0.118 0.119 0.084 0.113 0.114 ...
##  $ compactness_mean       : num  0.278 0.24 0.1 0.229 0.16 ...
##  $ concavity_mean         : num  0.3001 0.2273 0.0994 0.2128 0.1639 ...
##  $ concave.points_mean    : num  0.1471 0.0854 0.0536 0.0803 0.0736 ...
##  $ symmetry_mean          : num  0.242 0.203 0.185 0.207 0.23 ...
##  $ fractal_dimension_mean : num  0.0787 0.0824 0.0534 0.0768 0.0708 ...
##  $ radius_se              : num  1.095 0.298 0.403 0.212 0.37 ...
##  $ texture_se             : num  0.905 1.599 1.078 1.169 1.033 ...
##  $ perimeter_se           : num  8.59 2.04 2.9 2.06 2.88 ...
##  $ area_se                : num  153.4 23.9 36.6 19.2 32.5 ...
##  $ smoothness_se          : num  0.0064 0.00715 0.00977 0.00643 0.00561 ...
##  $ compactness_se         : num  0.049 0.0722 0.0313 0.0594 0.0424 ...
##  $ concavity_se           : num  0.0537 0.0774 0.0505 0.055 0.0474 ...
##  $ concave.points_se      : num  0.0159 0.0143 0.0199 0.0163 0.0109 ...
##  $ symmetry_se            : num  0.03 0.0179 0.0298 0.0196 0.0186 ...
##  $ fractal_dimension_se   : num  0.00619 0.01008 0.003 0.00809 0.00547 ...
##  $ radius_worst           : num  25.4 15.1 16.8 15 17.5 ...
##  $ texture_worst          : num  17.3 40.7 27.7 32 37.1 ...
##  $ perimeter_worst        : num  184.6 97.7 112 108.8 124.1 ...
##  $ area_worst             : num  2019 711 876 698 943 ...
##  $ smoothness_worst       : num  0.162 0.185 0.113 0.165 0.168 ...
##  $ compactness_worst      : num  0.666 1.058 0.192 0.772 0.658 ...
##  $ concavity_worst        : num  0.712 1.105 0.232 0.694 0.703 ...
##  $ concave.points_worst   : num  0.265 0.221 0.112 0.221 0.171 ...
##  $ symmetry_worst         : num  0.46 0.437 0.281 0.36 0.422 ...
##  $ fractal_dimension_worst: num  0.1189 0.2075 0.0629 0.1431 0.1341 ...
##  - attr(*, ".internal.selfref")=<externalptr>
#knn

knnFit <- train(diagnosis~ ., data = dataTR, method = "knn", trControl = trainControl(method = "cv"),preProcess = c("center","scale"), tuneGrid = expand.grid(k=c(3,5,7,9,11)))
knnFit #k=5 is chosen for the neighborhood parameter
## k-Nearest Neighbors 
## 
## 398 samples
##  30 predictor
##   2 classes: 'B', 'M' 
## 
## Pre-processing: centered (30), scaled (30) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 359, 358, 358, 359, 358, 358, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    3  0.9597436  0.9121877
##    5  0.9622436  0.9179547
##    7  0.9572436  0.9075784
##    9  0.9522436  0.8960268
##   11  0.9547436  0.9014303
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.
y <- predict(knnFit,newdata=dataTE)
y
##   [1] M M M M M M M B B M M M M B M M M B B B M B M M B M B B B B B B M B M B B
##  [38] B M B B M B M B M M B M M B M B B M M B B M B B B B M M M M M B M M M M B
##  [75] B B B B B B B B B B B M B B B B B B B B B B B B M M B B B B M M M B B B B
## [112] B M B B B B M B B B B B M B B M B B B B M B B M M B B B B B B B M B B B B
## [149] M B B B B M M M M B B B B B B B B B B B B M M
## Levels: B M
table(y,dataTE$diagnosis) #we are able to predict all the malignant cases correctly, 3 of the benign cases mistakenly. The cost of making an error on a benign case is much lower than the malignant case.
##    
## y     B   M
##   B 107   3
##   M   0  61
###rpart

tree1=rpart(dataTR$diagnosis~.,method="class",data=dataTR,maxdepth=4,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$diagnosis ~ ., data = dataTR, method = "class", 
##     maxdepth = 4, xval = 10)
##   n= 398 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.77702703      0 1.0000000 1.0000000 0.06514748
## 2 0.06081081      1 0.2229730 0.3445946 0.04505537
## 3 0.01013514      2 0.1621622 0.2364865 0.03817544
## 4 0.01000000      4 0.1418919 0.2432432 0.03866360
## 
## Variable importance
##      perimeter_worst         radius_worst           area_worst 
##                   17                   16                   15 
##       perimeter_mean            area_mean          radius_mean 
##                   15                   15                   15 
## concave.points_worst    compactness_worst      concavity_worst 
##                    2                    1                    1 
##  concave.points_mean       concavity_mean       symmetry_worst 
##                    1                    1                    1 
## 
## Node number 1: 398 observations,    complexity param=0.777027
##   predicted class=B  expected loss=0.3718593  P(node) =1
##     class counts:   250   148
##    probabilities: 0.628 0.372 
##   left son=2 (265 obs) right son=3 (133 obs)
##   Primary splits:
##       perimeter_worst      < 112.8    to the left,  improve=125.4949, (0 missing)
##       radius_worst         < 16.795   to the left,  improve=123.9590, (0 missing)
##       concave.points_mean  < 0.05142  to the left,  improve=123.4977, (0 missing)
##       concave.points_worst < 0.14655  to the left,  improve=122.8816, (0 missing)
##       area_worst           < 880.95   to the left,  improve=122.4873, (0 missing)
##   Surrogate splits:
##       radius_worst   < 17.245   to the left,  agree=0.970, adj=0.910, (0 split)
##       area_worst     < 906.9    to the left,  agree=0.967, adj=0.902, (0 split)
##       perimeter_mean < 96.405   to the left,  agree=0.947, adj=0.842, (0 split)
##       radius_mean    < 14.995   to the left,  agree=0.940, adj=0.820, (0 split)
##       area_mean      < 700.35   to the left,  agree=0.940, adj=0.820, (0 split)
## 
## Node number 2: 265 observations,    complexity param=0.06081081
##   predicted class=B  expected loss=0.09056604  P(node) =0.6658291
##     class counts:   241    24
##    probabilities: 0.909 0.091 
##   left son=4 (246 obs) right son=5 (19 obs)
##   Primary splits:
##       concave.points_worst < 0.14705  to the left,  improve=17.09742, (0 missing)
##       concave.points_mean  < 0.04923  to the left,  improve=13.06752, (0 missing)
##       perimeter_worst      < 102.05   to the left,  improve=10.70589, (0 missing)
##       concavity_mean       < 0.082405 to the left,  improve=10.61901, (0 missing)
##       compactness_worst    < 0.3901   to the left,  improve=10.55416, (0 missing)
##   Surrogate splits:
##       compactness_worst   < 0.3901   to the left,  agree=0.970, adj=0.579, (0 split)
##       concavity_worst     < 0.44225  to the left,  agree=0.966, adj=0.526, (0 split)
##       concavity_mean      < 0.13735  to the left,  agree=0.958, adj=0.421, (0 split)
##       concave.points_mean < 0.0661   to the left,  agree=0.958, adj=0.421, (0 split)
##       symmetry_worst      < 0.3617   to the left,  agree=0.955, adj=0.368, (0 split)
## 
## Node number 3: 133 observations
##   predicted class=M  expected loss=0.06766917  P(node) =0.3341709
##     class counts:     9   124
##    probabilities: 0.068 0.932 
## 
## Node number 4: 246 observations,    complexity param=0.01013514
##   predicted class=B  expected loss=0.04065041  P(node) =0.6180905
##     class counts:   236    10
##    probabilities: 0.959 0.041 
##   left son=8 (215 obs) right son=9 (31 obs)
##   Primary splits:
##       perimeter_worst      < 102.05   to the left,  improve=2.432003, (0 missing)
##       area_se              < 35.435   to the left,  improve=2.346509, (0 missing)
##       concave.points_worst < 0.11085  to the left,  improve=2.241196, (0 missing)
##       area_worst           < 727.1    to the left,  improve=2.072300, (0 missing)
##       radius_worst         < 15.78    to the left,  improve=1.905209, (0 missing)
##   Surrogate splits:
##       radius_worst   < 15.615   to the left,  agree=0.984, adj=0.871, (0 split)
##       area_worst     < 727.1    to the left,  agree=0.984, adj=0.871, (0 split)
##       perimeter_mean < 90.365   to the left,  agree=0.980, adj=0.839, (0 split)
##       radius_mean    < 14.165   to the left,  agree=0.972, adj=0.774, (0 split)
##       area_mean      < 620.2    to the left,  agree=0.972, adj=0.774, (0 split)
## 
## Node number 5: 19 observations
##   predicted class=M  expected loss=0.2631579  P(node) =0.04773869
##     class counts:     5    14
##    probabilities: 0.263 0.737 
## 
## Node number 8: 215 observations
##   predicted class=B  expected loss=0.01395349  P(node) =0.540201
##     class counts:   212     3
##    probabilities: 0.986 0.014 
## 
## Node number 9: 31 observations,    complexity param=0.01013514
##   predicted class=B  expected loss=0.2258065  P(node) =0.07788945
##     class counts:    24     7
##    probabilities: 0.774 0.226 
##   left son=18 (24 obs) right son=19 (7 obs)
##   Primary splits:
##       radius_mean          < 14.305   to the right, improve=4.314900, (0 missing)
##       area_mean            < 636.9    to the right, improve=4.314900, (0 missing)
##       perimeter_mean       < 94.265   to the right, improve=3.838710, (0 missing)
##       concave.points_worst < 0.11085  to the left,  improve=3.838710, (0 missing)
##       texture_worst        < 29.18    to the left,  improve=3.436536, (0 missing)
##   Surrogate splits:
##       area_mean        < 636.9    to the right, agree=1.000, adj=1.000, (0 split)
##       perimeter_mean   < 91.97    to the right, agree=0.935, adj=0.714, (0 split)
##       texture_worst    < 33.28    to the left,  agree=0.871, adj=0.429, (0 split)
##       smoothness_worst < 0.1367   to the left,  agree=0.839, adj=0.286, (0 split)
##       texture_mean     < 24.5     to the left,  agree=0.806, adj=0.143, (0 split)
## 
## Node number 18: 24 observations
##   predicted class=B  expected loss=0.08333333  P(node) =0.06030151
##     class counts:    22     2
##    probabilities: 0.917 0.083 
## 
## Node number 19: 7 observations
##   predicted class=M  expected loss=0.2857143  P(node) =0.01758794
##     class counts:     2     5
##    probabilities: 0.286 0.714
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$diagnosis))
y1[y[,1]>0.5] <- "B"
y1[!(y[,1]>0.5)] <- "M"
table(y1,dataTE$diagnosis)
##    
## y1   B  M
##   B 98  4
##   M  9 60
length(y1)
## [1] 171
tree1=rpart(dataTR$diagnosis~.,method="class",data=dataTR,maxdepth=3,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$diagnosis ~ ., data = dataTR, method = "class", 
##     maxdepth = 3, xval = 10)
##   n= 398 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.77702703      0 1.0000000 1.0000000 0.06514748
## 2 0.06081081      1 0.2229730 0.3851351 0.04721841
## 3 0.01000000      2 0.1621622 0.2567568 0.03961319
## 
## Variable importance
##      perimeter_worst         radius_worst           area_worst 
##                   17                   16                   16 
##       perimeter_mean            area_mean          radius_mean 
##                   15                   14                   14 
## concave.points_worst    compactness_worst      concavity_worst 
##                    2                    1                    1 
##  concave.points_mean       concavity_mean       symmetry_worst 
##                    1                    1                    1 
## 
## Node number 1: 398 observations,    complexity param=0.777027
##   predicted class=B  expected loss=0.3718593  P(node) =1
##     class counts:   250   148
##    probabilities: 0.628 0.372 
##   left son=2 (265 obs) right son=3 (133 obs)
##   Primary splits:
##       perimeter_worst      < 112.8    to the left,  improve=125.4949, (0 missing)
##       radius_worst         < 16.795   to the left,  improve=123.9590, (0 missing)
##       concave.points_mean  < 0.05142  to the left,  improve=123.4977, (0 missing)
##       concave.points_worst < 0.14655  to the left,  improve=122.8816, (0 missing)
##       area_worst           < 880.95   to the left,  improve=122.4873, (0 missing)
##   Surrogate splits:
##       radius_worst   < 17.245   to the left,  agree=0.970, adj=0.910, (0 split)
##       area_worst     < 906.9    to the left,  agree=0.967, adj=0.902, (0 split)
##       perimeter_mean < 96.405   to the left,  agree=0.947, adj=0.842, (0 split)
##       radius_mean    < 14.995   to the left,  agree=0.940, adj=0.820, (0 split)
##       area_mean      < 700.35   to the left,  agree=0.940, adj=0.820, (0 split)
## 
## Node number 2: 265 observations,    complexity param=0.06081081
##   predicted class=B  expected loss=0.09056604  P(node) =0.6658291
##     class counts:   241    24
##    probabilities: 0.909 0.091 
##   left son=4 (246 obs) right son=5 (19 obs)
##   Primary splits:
##       concave.points_worst < 0.14705  to the left,  improve=17.09742, (0 missing)
##       concave.points_mean  < 0.04923  to the left,  improve=13.06752, (0 missing)
##       perimeter_worst      < 102.05   to the left,  improve=10.70589, (0 missing)
##       concavity_mean       < 0.082405 to the left,  improve=10.61901, (0 missing)
##       compactness_worst    < 0.3901   to the left,  improve=10.55416, (0 missing)
##   Surrogate splits:
##       compactness_worst   < 0.3901   to the left,  agree=0.970, adj=0.579, (0 split)
##       concavity_worst     < 0.44225  to the left,  agree=0.966, adj=0.526, (0 split)
##       concavity_mean      < 0.13735  to the left,  agree=0.958, adj=0.421, (0 split)
##       concave.points_mean < 0.0661   to the left,  agree=0.958, adj=0.421, (0 split)
##       symmetry_worst      < 0.3617   to the left,  agree=0.955, adj=0.368, (0 split)
## 
## Node number 3: 133 observations
##   predicted class=M  expected loss=0.06766917  P(node) =0.3341709
##     class counts:     9   124
##    probabilities: 0.068 0.932 
## 
## Node number 4: 246 observations
##   predicted class=B  expected loss=0.04065041  P(node) =0.6180905
##     class counts:   236    10
##    probabilities: 0.959 0.041 
## 
## Node number 5: 19 observations
##   predicted class=M  expected loss=0.2631579  P(node) =0.04773869
##     class counts:     5    14
##    probabilities: 0.263 0.737
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$diagnosis))
y1[y[,1]>0.5] <- "B"
y1[!(y[,1]>0.5)] <- "M"
table(y1,dataTE$diagnosis)
##    
## y1    B   M
##   B 101   4
##   M   6  60
length(y1)
## [1] 171
tree1=rpart(dataTR$diagnosis~.,method="class",data=dataTR,maxdepth=5,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$diagnosis ~ ., data = dataTR, method = "class", 
##     maxdepth = 5, xval = 10)
##   n= 398 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.77702703      0 1.0000000 1.0000000 0.06514748
## 2 0.06081081      1 0.2229730 0.3716216 0.04651844
## 3 0.01013514      2 0.1621622 0.2364865 0.03817544
## 4 0.01000000      4 0.1418919 0.2364865 0.03817544
## 
## Variable importance
##      perimeter_worst         radius_worst           area_worst 
##                   17                   16                   15 
##       perimeter_mean            area_mean          radius_mean 
##                   15                   15                   15 
## concave.points_worst    compactness_worst      concavity_worst 
##                    2                    1                    1 
##  concave.points_mean       concavity_mean       symmetry_worst 
##                    1                    1                    1 
## 
## Node number 1: 398 observations,    complexity param=0.777027
##   predicted class=B  expected loss=0.3718593  P(node) =1
##     class counts:   250   148
##    probabilities: 0.628 0.372 
##   left son=2 (265 obs) right son=3 (133 obs)
##   Primary splits:
##       perimeter_worst      < 112.8    to the left,  improve=125.4949, (0 missing)
##       radius_worst         < 16.795   to the left,  improve=123.9590, (0 missing)
##       concave.points_mean  < 0.05142  to the left,  improve=123.4977, (0 missing)
##       concave.points_worst < 0.14655  to the left,  improve=122.8816, (0 missing)
##       area_worst           < 880.95   to the left,  improve=122.4873, (0 missing)
##   Surrogate splits:
##       radius_worst   < 17.245   to the left,  agree=0.970, adj=0.910, (0 split)
##       area_worst     < 906.9    to the left,  agree=0.967, adj=0.902, (0 split)
##       perimeter_mean < 96.405   to the left,  agree=0.947, adj=0.842, (0 split)
##       radius_mean    < 14.995   to the left,  agree=0.940, adj=0.820, (0 split)
##       area_mean      < 700.35   to the left,  agree=0.940, adj=0.820, (0 split)
## 
## Node number 2: 265 observations,    complexity param=0.06081081
##   predicted class=B  expected loss=0.09056604  P(node) =0.6658291
##     class counts:   241    24
##    probabilities: 0.909 0.091 
##   left son=4 (246 obs) right son=5 (19 obs)
##   Primary splits:
##       concave.points_worst < 0.14705  to the left,  improve=17.09742, (0 missing)
##       concave.points_mean  < 0.04923  to the left,  improve=13.06752, (0 missing)
##       perimeter_worst      < 102.05   to the left,  improve=10.70589, (0 missing)
##       concavity_mean       < 0.082405 to the left,  improve=10.61901, (0 missing)
##       compactness_worst    < 0.3901   to the left,  improve=10.55416, (0 missing)
##   Surrogate splits:
##       compactness_worst   < 0.3901   to the left,  agree=0.970, adj=0.579, (0 split)
##       concavity_worst     < 0.44225  to the left,  agree=0.966, adj=0.526, (0 split)
##       concavity_mean      < 0.13735  to the left,  agree=0.958, adj=0.421, (0 split)
##       concave.points_mean < 0.0661   to the left,  agree=0.958, adj=0.421, (0 split)
##       symmetry_worst      < 0.3617   to the left,  agree=0.955, adj=0.368, (0 split)
## 
## Node number 3: 133 observations
##   predicted class=M  expected loss=0.06766917  P(node) =0.3341709
##     class counts:     9   124
##    probabilities: 0.068 0.932 
## 
## Node number 4: 246 observations,    complexity param=0.01013514
##   predicted class=B  expected loss=0.04065041  P(node) =0.6180905
##     class counts:   236    10
##    probabilities: 0.959 0.041 
##   left son=8 (215 obs) right son=9 (31 obs)
##   Primary splits:
##       perimeter_worst      < 102.05   to the left,  improve=2.432003, (0 missing)
##       area_se              < 35.435   to the left,  improve=2.346509, (0 missing)
##       concave.points_worst < 0.11085  to the left,  improve=2.241196, (0 missing)
##       area_worst           < 727.1    to the left,  improve=2.072300, (0 missing)
##       radius_worst         < 15.78    to the left,  improve=1.905209, (0 missing)
##   Surrogate splits:
##       radius_worst   < 15.615   to the left,  agree=0.984, adj=0.871, (0 split)
##       area_worst     < 727.1    to the left,  agree=0.984, adj=0.871, (0 split)
##       perimeter_mean < 90.365   to the left,  agree=0.980, adj=0.839, (0 split)
##       radius_mean    < 14.165   to the left,  agree=0.972, adj=0.774, (0 split)
##       area_mean      < 620.2    to the left,  agree=0.972, adj=0.774, (0 split)
## 
## Node number 5: 19 observations
##   predicted class=M  expected loss=0.2631579  P(node) =0.04773869
##     class counts:     5    14
##    probabilities: 0.263 0.737 
## 
## Node number 8: 215 observations
##   predicted class=B  expected loss=0.01395349  P(node) =0.540201
##     class counts:   212     3
##    probabilities: 0.986 0.014 
## 
## Node number 9: 31 observations,    complexity param=0.01013514
##   predicted class=B  expected loss=0.2258065  P(node) =0.07788945
##     class counts:    24     7
##    probabilities: 0.774 0.226 
##   left son=18 (24 obs) right son=19 (7 obs)
##   Primary splits:
##       radius_mean          < 14.305   to the right, improve=4.314900, (0 missing)
##       area_mean            < 636.9    to the right, improve=4.314900, (0 missing)
##       perimeter_mean       < 94.265   to the right, improve=3.838710, (0 missing)
##       concave.points_worst < 0.11085  to the left,  improve=3.838710, (0 missing)
##       texture_worst        < 29.18    to the left,  improve=3.436536, (0 missing)
##   Surrogate splits:
##       area_mean        < 636.9    to the right, agree=1.000, adj=1.000, (0 split)
##       perimeter_mean   < 91.97    to the right, agree=0.935, adj=0.714, (0 split)
##       texture_worst    < 33.28    to the left,  agree=0.871, adj=0.429, (0 split)
##       smoothness_worst < 0.1367   to the left,  agree=0.839, adj=0.286, (0 split)
##       texture_mean     < 24.5     to the left,  agree=0.806, adj=0.143, (0 split)
## 
## Node number 18: 24 observations
##   predicted class=B  expected loss=0.08333333  P(node) =0.06030151
##     class counts:    22     2
##    probabilities: 0.917 0.083 
## 
## Node number 19: 7 observations
##   predicted class=M  expected loss=0.2857143  P(node) =0.01758794
##     class counts:     2     5
##    probabilities: 0.286 0.714
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)
y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$diagnosis))
y1[y[,1]>0.5] <- "B"
y1[!(y[,1]>0.5)] <- "M"
table(y1,dataTE$diagnosis)
##    
## y1   B  M
##   B 98  4
##   M  9 60
length(y1)
## [1] 171
tree1=rpart(dataTR$diagnosis~.,method="class",data=dataTR,maxdepth=6,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$diagnosis ~ ., data = dataTR, method = "class", 
##     maxdepth = 6, xval = 10)
##   n= 398 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.77702703      0 1.0000000 1.0000000 0.06514748
## 2 0.06081081      1 0.2229730 0.3783784 0.04687096
## 3 0.01013514      2 0.1621622 0.2567568 0.03961319
## 4 0.01000000      4 0.1418919 0.2364865 0.03817544
## 
## Variable importance
##      perimeter_worst         radius_worst           area_worst 
##                   17                   16                   15 
##       perimeter_mean            area_mean          radius_mean 
##                   15                   15                   15 
## concave.points_worst    compactness_worst      concavity_worst 
##                    2                    1                    1 
##  concave.points_mean       concavity_mean       symmetry_worst 
##                    1                    1                    1 
## 
## Node number 1: 398 observations,    complexity param=0.777027
##   predicted class=B  expected loss=0.3718593  P(node) =1
##     class counts:   250   148
##    probabilities: 0.628 0.372 
##   left son=2 (265 obs) right son=3 (133 obs)
##   Primary splits:
##       perimeter_worst      < 112.8    to the left,  improve=125.4949, (0 missing)
##       radius_worst         < 16.795   to the left,  improve=123.9590, (0 missing)
##       concave.points_mean  < 0.05142  to the left,  improve=123.4977, (0 missing)
##       concave.points_worst < 0.14655  to the left,  improve=122.8816, (0 missing)
##       area_worst           < 880.95   to the left,  improve=122.4873, (0 missing)
##   Surrogate splits:
##       radius_worst   < 17.245   to the left,  agree=0.970, adj=0.910, (0 split)
##       area_worst     < 906.9    to the left,  agree=0.967, adj=0.902, (0 split)
##       perimeter_mean < 96.405   to the left,  agree=0.947, adj=0.842, (0 split)
##       radius_mean    < 14.995   to the left,  agree=0.940, adj=0.820, (0 split)
##       area_mean      < 700.35   to the left,  agree=0.940, adj=0.820, (0 split)
## 
## Node number 2: 265 observations,    complexity param=0.06081081
##   predicted class=B  expected loss=0.09056604  P(node) =0.6658291
##     class counts:   241    24
##    probabilities: 0.909 0.091 
##   left son=4 (246 obs) right son=5 (19 obs)
##   Primary splits:
##       concave.points_worst < 0.14705  to the left,  improve=17.09742, (0 missing)
##       concave.points_mean  < 0.04923  to the left,  improve=13.06752, (0 missing)
##       perimeter_worst      < 102.05   to the left,  improve=10.70589, (0 missing)
##       concavity_mean       < 0.082405 to the left,  improve=10.61901, (0 missing)
##       compactness_worst    < 0.3901   to the left,  improve=10.55416, (0 missing)
##   Surrogate splits:
##       compactness_worst   < 0.3901   to the left,  agree=0.970, adj=0.579, (0 split)
##       concavity_worst     < 0.44225  to the left,  agree=0.966, adj=0.526, (0 split)
##       concavity_mean      < 0.13735  to the left,  agree=0.958, adj=0.421, (0 split)
##       concave.points_mean < 0.0661   to the left,  agree=0.958, adj=0.421, (0 split)
##       symmetry_worst      < 0.3617   to the left,  agree=0.955, adj=0.368, (0 split)
## 
## Node number 3: 133 observations
##   predicted class=M  expected loss=0.06766917  P(node) =0.3341709
##     class counts:     9   124
##    probabilities: 0.068 0.932 
## 
## Node number 4: 246 observations,    complexity param=0.01013514
##   predicted class=B  expected loss=0.04065041  P(node) =0.6180905
##     class counts:   236    10
##    probabilities: 0.959 0.041 
##   left son=8 (215 obs) right son=9 (31 obs)
##   Primary splits:
##       perimeter_worst      < 102.05   to the left,  improve=2.432003, (0 missing)
##       area_se              < 35.435   to the left,  improve=2.346509, (0 missing)
##       concave.points_worst < 0.11085  to the left,  improve=2.241196, (0 missing)
##       area_worst           < 727.1    to the left,  improve=2.072300, (0 missing)
##       radius_worst         < 15.78    to the left,  improve=1.905209, (0 missing)
##   Surrogate splits:
##       radius_worst   < 15.615   to the left,  agree=0.984, adj=0.871, (0 split)
##       area_worst     < 727.1    to the left,  agree=0.984, adj=0.871, (0 split)
##       perimeter_mean < 90.365   to the left,  agree=0.980, adj=0.839, (0 split)
##       radius_mean    < 14.165   to the left,  agree=0.972, adj=0.774, (0 split)
##       area_mean      < 620.2    to the left,  agree=0.972, adj=0.774, (0 split)
## 
## Node number 5: 19 observations
##   predicted class=M  expected loss=0.2631579  P(node) =0.04773869
##     class counts:     5    14
##    probabilities: 0.263 0.737 
## 
## Node number 8: 215 observations
##   predicted class=B  expected loss=0.01395349  P(node) =0.540201
##     class counts:   212     3
##    probabilities: 0.986 0.014 
## 
## Node number 9: 31 observations,    complexity param=0.01013514
##   predicted class=B  expected loss=0.2258065  P(node) =0.07788945
##     class counts:    24     7
##    probabilities: 0.774 0.226 
##   left son=18 (24 obs) right son=19 (7 obs)
##   Primary splits:
##       radius_mean          < 14.305   to the right, improve=4.314900, (0 missing)
##       area_mean            < 636.9    to the right, improve=4.314900, (0 missing)
##       perimeter_mean       < 94.265   to the right, improve=3.838710, (0 missing)
##       concave.points_worst < 0.11085  to the left,  improve=3.838710, (0 missing)
##       texture_worst        < 29.18    to the left,  improve=3.436536, (0 missing)
##   Surrogate splits:
##       area_mean        < 636.9    to the right, agree=1.000, adj=1.000, (0 split)
##       perimeter_mean   < 91.97    to the right, agree=0.935, adj=0.714, (0 split)
##       texture_worst    < 33.28    to the left,  agree=0.871, adj=0.429, (0 split)
##       smoothness_worst < 0.1367   to the left,  agree=0.839, adj=0.286, (0 split)
##       texture_mean     < 24.5     to the left,  agree=0.806, adj=0.143, (0 split)
## 
## Node number 18: 24 observations
##   predicted class=B  expected loss=0.08333333  P(node) =0.06030151
##     class counts:    22     2
##    probabilities: 0.917 0.083 
## 
## Node number 19: 7 observations
##   predicted class=M  expected loss=0.2857143  P(node) =0.01758794
##     class counts:     2     5
##    probabilities: 0.286 0.714
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$diagnosis))
y1[y[,1]>0.5] <- "B"
y1[!(y[,1]>0.5)] <- "M"
table(y1,dataTE$diagnosis)
##    
## y1   B  M
##   B 98  4
##   M  9 60
length(y1)
## [1] 171
tree1=rpart(dataTR$diagnosis~.,method="class",data=dataTR,maxdepth=7,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$diagnosis ~ ., data = dataTR, method = "class", 
##     maxdepth = 7, xval = 10)
##   n= 398 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.77702703      0 1.0000000 1.0000000 0.06514748
## 2 0.06081081      1 0.2229730 0.3581081 0.04579776
## 3 0.01013514      2 0.1621622 0.2297297 0.03767792
## 4 0.01000000      4 0.1418919 0.2229730 0.03717065
## 
## Variable importance
##      perimeter_worst         radius_worst           area_worst 
##                   17                   16                   15 
##       perimeter_mean            area_mean          radius_mean 
##                   15                   15                   15 
## concave.points_worst    compactness_worst      concavity_worst 
##                    2                    1                    1 
##  concave.points_mean       concavity_mean       symmetry_worst 
##                    1                    1                    1 
## 
## Node number 1: 398 observations,    complexity param=0.777027
##   predicted class=B  expected loss=0.3718593  P(node) =1
##     class counts:   250   148
##    probabilities: 0.628 0.372 
##   left son=2 (265 obs) right son=3 (133 obs)
##   Primary splits:
##       perimeter_worst      < 112.8    to the left,  improve=125.4949, (0 missing)
##       radius_worst         < 16.795   to the left,  improve=123.9590, (0 missing)
##       concave.points_mean  < 0.05142  to the left,  improve=123.4977, (0 missing)
##       concave.points_worst < 0.14655  to the left,  improve=122.8816, (0 missing)
##       area_worst           < 880.95   to the left,  improve=122.4873, (0 missing)
##   Surrogate splits:
##       radius_worst   < 17.245   to the left,  agree=0.970, adj=0.910, (0 split)
##       area_worst     < 906.9    to the left,  agree=0.967, adj=0.902, (0 split)
##       perimeter_mean < 96.405   to the left,  agree=0.947, adj=0.842, (0 split)
##       radius_mean    < 14.995   to the left,  agree=0.940, adj=0.820, (0 split)
##       area_mean      < 700.35   to the left,  agree=0.940, adj=0.820, (0 split)
## 
## Node number 2: 265 observations,    complexity param=0.06081081
##   predicted class=B  expected loss=0.09056604  P(node) =0.6658291
##     class counts:   241    24
##    probabilities: 0.909 0.091 
##   left son=4 (246 obs) right son=5 (19 obs)
##   Primary splits:
##       concave.points_worst < 0.14705  to the left,  improve=17.09742, (0 missing)
##       concave.points_mean  < 0.04923  to the left,  improve=13.06752, (0 missing)
##       perimeter_worst      < 102.05   to the left,  improve=10.70589, (0 missing)
##       concavity_mean       < 0.082405 to the left,  improve=10.61901, (0 missing)
##       compactness_worst    < 0.3901   to the left,  improve=10.55416, (0 missing)
##   Surrogate splits:
##       compactness_worst   < 0.3901   to the left,  agree=0.970, adj=0.579, (0 split)
##       concavity_worst     < 0.44225  to the left,  agree=0.966, adj=0.526, (0 split)
##       concavity_mean      < 0.13735  to the left,  agree=0.958, adj=0.421, (0 split)
##       concave.points_mean < 0.0661   to the left,  agree=0.958, adj=0.421, (0 split)
##       symmetry_worst      < 0.3617   to the left,  agree=0.955, adj=0.368, (0 split)
## 
## Node number 3: 133 observations
##   predicted class=M  expected loss=0.06766917  P(node) =0.3341709
##     class counts:     9   124
##    probabilities: 0.068 0.932 
## 
## Node number 4: 246 observations,    complexity param=0.01013514
##   predicted class=B  expected loss=0.04065041  P(node) =0.6180905
##     class counts:   236    10
##    probabilities: 0.959 0.041 
##   left son=8 (215 obs) right son=9 (31 obs)
##   Primary splits:
##       perimeter_worst      < 102.05   to the left,  improve=2.432003, (0 missing)
##       area_se              < 35.435   to the left,  improve=2.346509, (0 missing)
##       concave.points_worst < 0.11085  to the left,  improve=2.241196, (0 missing)
##       area_worst           < 727.1    to the left,  improve=2.072300, (0 missing)
##       radius_worst         < 15.78    to the left,  improve=1.905209, (0 missing)
##   Surrogate splits:
##       radius_worst   < 15.615   to the left,  agree=0.984, adj=0.871, (0 split)
##       area_worst     < 727.1    to the left,  agree=0.984, adj=0.871, (0 split)
##       perimeter_mean < 90.365   to the left,  agree=0.980, adj=0.839, (0 split)
##       radius_mean    < 14.165   to the left,  agree=0.972, adj=0.774, (0 split)
##       area_mean      < 620.2    to the left,  agree=0.972, adj=0.774, (0 split)
## 
## Node number 5: 19 observations
##   predicted class=M  expected loss=0.2631579  P(node) =0.04773869
##     class counts:     5    14
##    probabilities: 0.263 0.737 
## 
## Node number 8: 215 observations
##   predicted class=B  expected loss=0.01395349  P(node) =0.540201
##     class counts:   212     3
##    probabilities: 0.986 0.014 
## 
## Node number 9: 31 observations,    complexity param=0.01013514
##   predicted class=B  expected loss=0.2258065  P(node) =0.07788945
##     class counts:    24     7
##    probabilities: 0.774 0.226 
##   left son=18 (24 obs) right son=19 (7 obs)
##   Primary splits:
##       radius_mean          < 14.305   to the right, improve=4.314900, (0 missing)
##       area_mean            < 636.9    to the right, improve=4.314900, (0 missing)
##       perimeter_mean       < 94.265   to the right, improve=3.838710, (0 missing)
##       concave.points_worst < 0.11085  to the left,  improve=3.838710, (0 missing)
##       texture_worst        < 29.18    to the left,  improve=3.436536, (0 missing)
##   Surrogate splits:
##       area_mean        < 636.9    to the right, agree=1.000, adj=1.000, (0 split)
##       perimeter_mean   < 91.97    to the right, agree=0.935, adj=0.714, (0 split)
##       texture_worst    < 33.28    to the left,  agree=0.871, adj=0.429, (0 split)
##       smoothness_worst < 0.1367   to the left,  agree=0.839, adj=0.286, (0 split)
##       texture_mean     < 24.5     to the left,  agree=0.806, adj=0.143, (0 split)
## 
## Node number 18: 24 observations
##   predicted class=B  expected loss=0.08333333  P(node) =0.06030151
##     class counts:    22     2
##    probabilities: 0.917 0.083 
## 
## Node number 19: 7 observations
##   predicted class=M  expected loss=0.2857143  P(node) =0.01758794
##     class counts:     2     5
##    probabilities: 0.286 0.714
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)
y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$diagnosis))
y1[y[,1]>0.5] <- "B"
y1[!(y[,1]>0.5)] <- "M"
table(y1,dataTE$diagnosis)
##    
## y1   B  M
##   B 98  4
##   M  9 60
length(y1)
## [1] 171
##It does not make a difference on the depth parameter, if we decrease it further from 3 then it will cause us to do underfitting.

###RF

rf.breast=randomForest(diagnosis~.,data=dataTR,mtry=4,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.breast)

varImpPlot(rf.breast)

pred.breast = predict(rf.breast,newdata=dataTE)
table(pred.breast,dataTE[[1]])
##            
## pred.breast   B   M
##           B 107   6
##           M   0  58
rf.breast=randomForest(diagnosis~.,data=dataTR,mtry=3,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.breast)

varImpPlot(rf.breast)

pred.breast = predict(rf.breast,newdata=dataTE)
table(pred.breast,dataTE[[1]])
##            
## pred.breast   B   M
##           B 107   4
##           M   0  60
rf.breast=randomForest(diagnosis~.,data=dataTR,mtry=5,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.breast)

varImpPlot(rf.breast)

pred.breast = predict(rf.breast,newdata=dataTE)
table(pred.breast,dataTE[[1]])
##            
## pred.breast   B   M
##           B 107   4
##           M   0  60
rf.breast=randomForest(diagnosis~.,data=dataTR,mtry=6,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.breast)

varImpPlot(rf.breast)

pred.breast = predict(rf.breast,newdata=dataTE)
table(pred.breast,dataTE[[1]])
##            
## pred.breast   B   M
##           B 107   6
##           M   0  58
rf.breast=randomForest(diagnosis~.,data=dataTR,mtry=7,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.breast)

varImpPlot(rf.breast)

pred.breast = predict(rf.breast,newdata=dataTE)
table(pred.breast,dataTE[[1]])
##            
## pred.breast   B   M
##           B 107   5
##           M   0  59
#mtry=3 works best, it predicts all the malignant cases correctly.

####gbm

noftrees=100
depth=5
learning_rate=0.2
sampling_fraction=0.5


#boosting_model=gbm(diagnosis~.,distribution="bernoulli", data=dataTR, n.trees = noftrees,interaction.depth = depth,cv.folds=10,class.stratify.cv=TRUE, 
 #                  n.minobsinnode = 5, shrinkage =learning_rate,
#                   bag.fraction = sampling_fraction)
#boosting_model
#summary(boosting_model)

#pred.breast = predict.gbm(boosting_model,newdata=dataTE,type="response",single.tree=FALSE)
#pred.breast = predict(boosting_model,newdata=dataTE,type="response",single.tree=FALSE)

#y1 <- rep(0,length(pred.breast))
#ind1 <- pred.bankrupcy>1.5
#y1[ind1] <- "M"
#y1[!ind1] <- "B"
#table(y1,dataTE$diagnosis)
###Spambase



data <- data.table(read.csv("spambase.data",stringsAsFactors=T))
str(data)
## Classes 'data.table' and 'data.frame':   4600 obs. of  58 variables:
##  $ X0     : num  0.21 0.06 0 0 0 0 0 0.15 0.06 0 ...
##  $ X0.64  : num  0.28 0 0 0 0 0 0 0 0.12 0 ...
##  $ X0.64.1: num  0.5 0.71 0 0 0 0 0 0.46 0.77 0 ...
##  $ X0.1   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.32  : num  0.14 1.23 0.63 0.63 1.85 1.92 1.88 0.61 0.19 0 ...
##  $ X0.2   : num  0.28 0.19 0 0 0 0 0 0 0.32 0 ...
##  $ X0.3   : num  0.21 0.19 0.31 0.31 0 0 0 0.3 0.38 0.96 ...
##  $ X0.4   : num  0.07 0.12 0.63 0.63 1.85 0 1.88 0 0 0 ...
##  $ X0.5   : num  0 0.64 0.31 0.31 0 0 0 0.92 0.06 0 ...
##  $ X0.6   : num  0.94 0.25 0.63 0.63 0 0.64 0 0.76 0 1.92 ...
##  $ X0.7   : num  0.21 0.38 0.31 0.31 0 0.96 0 0.76 0 0.96 ...
##  $ X0.64.2: num  0.79 0.45 0.31 0.31 0 1.28 0 0.92 0.64 0 ...
##  $ X0.8   : num  0.65 0.12 0.31 0.31 0 0 0 0 0.25 0 ...
##  $ X0.9   : num  0.21 0 0 0 0 0 0 0 0 0 ...
##  $ X0.10  : num  0.14 1.75 0 0 0 0 0 0 0.12 0 ...
##  $ X0.32.1: num  0.14 0.06 0.31 0.31 0 0.96 0 0 0 0 ...
##  $ X0.11  : num  0.07 0.06 0 0 0 0 0 0 0 0 ...
##  $ X1.29  : num  0.28 1.03 0 0 0 0.32 0 0.15 0.12 0.96 ...
##  $ X1.93  : num  3.47 1.36 3.18 3.18 0 3.85 0 1.23 1.67 3.84 ...
##  $ X0.12  : num  0 0.32 0 0 0 0 0 3.53 0.06 0 ...
##  $ X0.96  : num  1.59 0.51 0.31 0.31 0 0.64 0 2 0.71 0.96 ...
##  $ X0.13  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.14  : num  0.43 1.16 0 0 0 0 0 0 0.19 0 ...
##  $ X0.15  : num  0.43 0.06 0 0 0 0 0 0.15 0 0 ...
##  $ X0.16  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.17  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.18  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.19  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.20  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.21  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.22  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.23  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.24  : num  0 0 0 0 0 0 0 0.15 0 0 ...
##  $ X0.25  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.26  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.27  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.28  : num  0.07 0 0 0 0 0 0 0 0 0 ...
##  $ X0.29  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.30  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.31  : num  0 0.06 0 0 0 0 0 0 0 0.96 ...
##  $ X0.33  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.34  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.35  : num  0 0.12 0 0 0 0 0 0.3 0 0 ...
##  $ X0.36  : num  0 0 0 0 0 0 0 0 0.06 0 ...
##  $ X0.37  : num  0 0.06 0 0 0 0 0 0 0 0 ...
##  $ X0.38  : num  0 0.06 0 0 0 0 0 0 0 0 ...
##  $ X0.39  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.40  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.41  : num  0 0.01 0 0 0 0 0 0 0.04 0 ...
##  $ X0.42  : num  0.132 0.143 0.137 0.135 0.223 0.054 0.206 0.271 0.03 0 ...
##  $ X0.43  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.778 : num  0.372 0.276 0.137 0.135 0 0.164 0 0.181 0.244 0.462 ...
##  $ X0.44  : num  0.18 0.184 0 0 0 0.054 0 0.203 0.081 0 ...
##  $ X0.45  : num  0.048 0.01 0 0 0 0 0 0.022 0 0 ...
##  $ X3.756 : num  5.11 9.82 3.54 3.54 3 ...
##  $ X61    : int  101 485 40 40 15 4 11 445 43 6 ...
##  $ X278   : int  1028 2259 191 191 54 112 49 1257 749 21 ...
##  $ X1     : int  1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr>
data$X1 <- as.factor(data$X1)
set.seed(582) #using caTools performing stratified sampling
split=sample.split(data$X1, SplitRatio=0.7)
dataTR=subset(data,split==TRUE)
dataTE=subset(data,split==FALSE)
str(dataTR)
## Classes 'data.table' and 'data.frame':   3220 obs. of  58 variables:
##  $ X0     : num  0.06 0 0 0 0 0 0.15 0.06 0 0 ...
##  $ X0.64  : num  0 0 0 0 0 0 0 0.12 0 0.69 ...
##  $ X0.64.1: num  0.71 0 0 0 0 0 0.46 0.77 0.25 0.34 ...
##  $ X0.1   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.32  : num  1.23 0.63 0.63 1.85 1.92 1.88 0.61 0.19 0.38 0.34 ...
##  $ X0.2   : num  0.19 0 0 0 0 0 0 0.32 0.25 0 ...
##  $ X0.3   : num  0.19 0.31 0.31 0 0 0 0.3 0.38 0.25 0 ...
##  $ X0.4   : num  0.12 0.63 0.63 1.85 0 1.88 0 0 0 0 ...
##  $ X0.5   : num  0.64 0.31 0.31 0 0 0 0.92 0.06 0 0 ...
##  $ X0.6   : num  0.25 0.63 0.63 0 0.64 0 0.76 0 0 0 ...
##  $ X0.7   : num  0.38 0.31 0.31 0 0.96 0 0.76 0 0.12 0 ...
##  $ X0.64.2: num  0.45 0.31 0.31 0 1.28 0 0.92 0.64 0.12 0.69 ...
##  $ X0.8   : num  0.12 0.31 0.31 0 0 0 0 0.25 0.12 0 ...
##  $ X0.9   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.10  : num  1.75 0 0 0 0 0 0 0.12 0 0 ...
##  $ X0.32.1: num  0.06 0.31 0.31 0 0.96 0 0 0 0 0.34 ...
##  $ X0.11  : num  0.06 0 0 0 0 0 0 0 0 0 ...
##  $ X1.29  : num  1.03 0 0 0 0.32 0 0.15 0.12 0 1.39 ...
##  $ X1.93  : num  1.36 3.18 3.18 0 3.85 0 1.23 1.67 1.16 2.09 ...
##  $ X0.12  : num  0.32 0 0 0 0 0 3.53 0.06 0 0 ...
##  $ X0.96  : num  0.51 0.31 0.31 0 0.64 0 2 0.71 0.77 1.04 ...
##  $ X0.13  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.14  : num  1.16 0 0 0 0 0 0 0.19 0 0 ...
##  $ X0.15  : num  0.06 0 0 0 0 0 0.15 0 0 0 ...
##  $ X0.16  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.17  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.18  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.19  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.20  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.21  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.22  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.23  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.24  : num  0 0 0 0 0 0 0.15 0 0 0 ...
##  $ X0.25  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.26  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.27  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.28  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.29  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.30  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.31  : num  0.06 0 0 0 0 0 0 0 0 0 ...
##  $ X0.33  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.34  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.35  : num  0.12 0 0 0 0 0 0.3 0 0 0 ...
##  $ X0.36  : num  0 0 0 0 0 0 0 0.06 0 0 ...
##  $ X0.37  : num  0.06 0 0 0 0 0 0 0 0 0 ...
##  $ X0.38  : num  0.06 0 0 0 0 0 0 0 0 0 ...
##  $ X0.39  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.40  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.41  : num  0.01 0 0 0 0 0 0 0.04 0.022 0 ...
##  $ X0.42  : num  0.143 0.137 0.135 0.223 0.054 0.206 0.271 0.03 0.044 0.056 ...
##  $ X0.43  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X0.778 : num  0.276 0.137 0.135 0 0.164 0 0.181 0.244 0.663 0.786 ...
##  $ X0.44  : num  0.184 0 0 0 0.054 0 0.203 0.081 0 0 ...
##  $ X0.45  : num  0.01 0 0 0 0 0 0.022 0 0 0 ...
##  $ X3.756 : num  9.82 3.54 3.54 3 1.67 ...
##  $ X61    : int  485 40 40 15 4 11 445 43 11 61 ...
##  $ X278   : int  2259 191 191 54 112 49 1257 749 184 261 ...
##  $ X1     : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  - attr(*, ".internal.selfref")=<externalptr>
##knn

knnFit <- train(X1~ ., data = dataTR, method = "knn", trControl = trainControl(method = "cv"),preProcess = c("center","scale"), tuneGrid = expand.grid(k=c(3,5,7,9,11)))
knnFit #k=5 was optimal
## k-Nearest Neighbors 
## 
## 3220 samples
##   57 predictor
##    2 classes: '0', '1' 
## 
## Pre-processing: centered (57), scaled (57) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 2897, 2898, 2898, 2898, 2898, 2898, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    3  0.8999863  0.7883162
##    5  0.9024863  0.7933623
##    7  0.9012402  0.7903883
##    9  0.8996893  0.7866130
##   11  0.9018681  0.7906853
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.
y <- predict(knnFit,newdata=dataTE)
y
##    [1] 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
##   [38] 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0
##   [75] 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1
##  [112] 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1
##  [149] 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
##  [186] 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1
##  [223] 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
##  [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1
##  [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
##  [334] 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1
##  [371] 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [408] 1 0 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1
##  [445] 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0
##  [482] 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1
##  [519] 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
##  [556] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [593] 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
##  [630] 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [667] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
##  [704] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [741] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0
##  [778] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [815] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [852] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [889] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [926] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [963] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0
## [1000] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [1037] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## [1074] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## [1111] 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## [1148] 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [1185] 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [1222] 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0
## [1259] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## [1296] 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
## [1333] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [1370] 0 0 0 0 0 0 0 0 0 0 0
## Levels: 0 1
table(y,dataTE$X1)
##    
## y     0   1
##   0 777  72
##   1  59 472
#Rpart

tree1=rpart(dataTR$X1~.,method="class",data=dataTR,maxdepth=10,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$X1 ~ ., data = dataTR, method = "class", 
##     maxdepth = 10, xval = 10)
##   n= 3220 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.47949527      0 1.0000000 1.0000000 0.02186514
## 2 0.07334385      1 0.5205047 0.5425868 0.01834336
## 3 0.06782334      2 0.4471609 0.4179811 0.01659460
## 4 0.02208202      4 0.3115142 0.3430599 0.01529713
## 5 0.01813880      5 0.2894322 0.3217666 0.01488643
## 6 0.01419558      6 0.2712934 0.3044164 0.01453605
## 7 0.01025237      7 0.2570978 0.2791798 0.01399886
## 8 0.01000000      8 0.2468454 0.2720820 0.01384144
## 
## Variable importance
##  X0.778     X61   X0.44 X0.32.1   X0.96    X0.3 X0.64.1  X3.756    X278   X0.15 
##      24      14      11      10       9       8       6       5       5       2 
##   X0.32   X0.16   X0.14   X0.17 
##       2       1       1       1 
## 
## Node number 1: 3220 observations,    complexity param=0.4794953
##   predicted class=0  expected loss=0.3937888  P(node) =1
##     class counts:  1952  1268
##    probabilities: 0.606 0.394 
##   left son=2 (1858 obs) right son=3 (1362 obs)
##   Primary splits:
##       X0.778  < 0.0785 to the left,  improve=512.2678, (0 missing)
##       X0.44   < 0.0555 to the left,  improve=499.2630, (0 missing)
##       X0.3    < 0.01   to the left,  improve=434.2978, (0 missing)
##       X0.32.1 < 0.095  to the left,  improve=404.5992, (0 missing)
##       X0.96   < 0.405  to the left,  improve=398.9498, (0 missing)
##   Surrogate splits:
##       X0.44   < 0.0455 to the left,  agree=0.707, adj=0.308, (0 split)
##       X0.32.1 < 0.095  to the left,  agree=0.706, adj=0.304, (0 split)
##       X0.96   < 0.465  to the left,  agree=0.704, adj=0.300, (0 split)
##       X61     < 47.5   to the left,  agree=0.689, adj=0.264, (0 split)
##       X0.64.1 < 0.265  to the left,  agree=0.684, adj=0.252, (0 split)
## 
## Node number 2: 1858 observations,    complexity param=0.07334385
##   predicted class=0  expected loss=0.1523143  P(node) =0.5770186
##     class counts:  1575   283
##    probabilities: 0.848 0.152 
##   left son=4 (1721 obs) right son=5 (137 obs)
##   Primary splits:
##       X0.3    < 0.02   to the left,  improve=139.65530, (0 missing)
##       X0.44   < 0.081  to the left,  improve=100.73830, (0 missing)
##       X0.15   < 0.01   to the left,  improve= 91.18909, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve= 67.22638, (0 missing)
##       X0.96   < 0.985  to the left,  improve= 48.45695, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.485  to the left,  agree=0.930, adj=0.051, (0 split)
##       X0.45 < 1.063  to the left,  agree=0.929, adj=0.044, (0 split)
##       X0.12 < 2.375  to the left,  agree=0.927, adj=0.015, (0 split)
##       X0.11 < 2.14   to the left,  agree=0.927, adj=0.007, (0 split)
##       X0.13 < 8.945  to the left,  agree=0.927, adj=0.007, (0 split)
## 
## Node number 3: 1362 observations,    complexity param=0.06782334
##   predicted class=1  expected loss=0.2767988  P(node) =0.4229814
##     class counts:   377   985
##    probabilities: 0.277 0.723 
##   left son=6 (526 obs) right son=7 (836 obs)
##   Primary splits:
##       X61    < 18.5   to the left,  improve=155.4341, (0 missing)
##       X3.756 < 2.292  to the left,  improve=154.9768, (0 missing)
##       X0.44  < 0.0065 to the left,  improve=140.6588, (0 missing)
##       X278   < 81.5   to the left,  improve=137.7187, (0 missing)
##       X0.96  < 0.395  to the left,  improve=121.1891, (0 missing)
##   Surrogate splits:
##       X3.756 < 2.323  to the left,  agree=0.874, adj=0.675, (0 split)
##       X278   < 105.5  to the left,  agree=0.858, adj=0.631, (0 split)
##       X0.96  < 0.15   to the left,  agree=0.726, adj=0.291, (0 split)
##       X0.44  < 0.0065 to the left,  agree=0.725, adj=0.287, (0 split)
##       X0.32  < 0.01   to the left,  agree=0.695, adj=0.211, (0 split)
## 
## Node number 4: 1721 observations,    complexity param=0.0181388
##   predicted class=0  expected loss=0.09761766  P(node) =0.534472
##     class counts:  1553   168
##    probabilities: 0.902 0.098 
##   left son=8 (1650 obs) right son=9 (71 obs)
##   Primary splits:
##       X0.15   < 0.01   to the left,  improve=47.17248, (0 missing)
##       X0.44   < 0.1675 to the left,  improve=44.56346, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve=22.88287, (0 missing)
##       X0.14   < 0.135  to the left,  improve=16.86471, (0 missing)
##       X0.96   < 3.815  to the left,  improve=15.98877, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.815  to the left,  agree=0.959, adj=0.014, (0 split)
##       X278  < 7241.5 to the left,  agree=0.959, adj=0.014, (0 split)
## 
## Node number 5: 137 observations
##   predicted class=1  expected loss=0.1605839  P(node) =0.04254658
##     class counts:    22   115
##    probabilities: 0.161 0.839 
## 
## Node number 6: 526 observations,    complexity param=0.06782334
##   predicted class=0  expected loss=0.4220532  P(node) =0.163354
##     class counts:   304   222
##    probabilities: 0.578 0.422 
##   left son=12 (386 obs) right son=13 (140 obs)
##   Primary splits:
##       X0.32.1 < 0.1    to the left,  improve=60.85818, (0 missing)
##       X0.3    < 0.09   to the left,  improve=46.85309, (0 missing)
##       X0.44   < 0.0125 to the left,  improve=45.41479, (0 missing)
##       X0.96   < 0.4    to the left,  improve=34.61798, (0 missing)
##       X0.14   < 0.17   to the left,  improve=31.18934, (0 missing)
##   Surrogate splits:
##       X0.3  < 0.19   to the left,  agree=0.764, adj=0.114, (0 split)
##       X0.7  < 0.3    to the left,  agree=0.760, adj=0.100, (0 split)
##       X0.15 < 0.175  to the left,  agree=0.753, adj=0.071, (0 split)
##       X0.14 < 0.065  to the left,  agree=0.747, adj=0.050, (0 split)
##       X0.4  < 1.22   to the left,  agree=0.743, adj=0.036, (0 split)
## 
## Node number 7: 836 observations,    complexity param=0.01419558
##   predicted class=1  expected loss=0.08732057  P(node) =0.2596273
##     class counts:    73   763
##    probabilities: 0.087 0.913 
##   left son=14 (26 obs) right son=15 (810 obs)
##   Primary splits:
##       X0.16 < 0.39   to the right, improve=30.90419, (0 missing)
##       X0.17 < 0.13   to the right, improve=22.47071, (0 missing)
##       X0.18 < 0.45   to the right, improve=18.56998, (0 missing)
##       X0.21 < 0.03   to the right, improve=16.87615, (0 missing)
##       X0.38 < 0.52   to the right, improve=15.15688, (0 missing)
##   Surrogate splits:
##       X0.17 < 0.215  to the right, agree=0.986, adj=0.538, (0 split)
##       X0.21 < 0.03   to the right, agree=0.978, adj=0.308, (0 split)
##       X0.18 < 0.45   to the right, agree=0.972, adj=0.115, (0 split)
##       X0.20 < 0.105  to the right, agree=0.972, adj=0.115, (0 split)
##       X0.19 < 0.705  to the right, agree=0.971, adj=0.077, (0 split)
## 
## Node number 8: 1650 observations
##   predicted class=0  expected loss=0.07333333  P(node) =0.5124224
##     class counts:  1529   121
##    probabilities: 0.927 0.073 
## 
## Node number 9: 71 observations
##   predicted class=1  expected loss=0.3380282  P(node) =0.02204969
##     class counts:    24    47
##    probabilities: 0.338 0.662 
## 
## Node number 12: 386 observations,    complexity param=0.02208202
##   predicted class=0  expected loss=0.2772021  P(node) =0.1198758
##     class counts:   279   107
##    probabilities: 0.723 0.277 
##   left son=24 (336 obs) right son=25 (50 obs)
##   Primary splits:
##       X0.44 < 0.1065 to the left,  improve=29.04257, (0 missing)
##       X0.3  < 0.24   to the left,  improve=22.60497, (0 missing)
##       X0.14 < 0.23   to the left,  improve=22.03941, (0 missing)
##       X0.96 < 0.615  to the left,  improve=20.87466, (0 missing)
##       X0.4  < 0.3    to the left,  improve=20.57106, (0 missing)
##   Surrogate splits:
##       X0.14 < 1.045  to the left,  agree=0.902, adj=0.24, (0 split)
##       X0.2  < 1.175  to the left,  agree=0.896, adj=0.20, (0 split)
##       X0.10 < 2.005  to the left,  agree=0.878, adj=0.06, (0 split)
##       X0.11 < 1.415  to the left,  agree=0.876, adj=0.04, (0 split)
##       X0.4  < 0.28   to the left,  agree=0.873, adj=0.02, (0 split)
## 
## Node number 13: 140 observations
##   predicted class=1  expected loss=0.1785714  P(node) =0.04347826
##     class counts:    25   115
##    probabilities: 0.179 0.821 
## 
## Node number 14: 26 observations
##   predicted class=0  expected loss=0.1538462  P(node) =0.008074534
##     class counts:    22     4
##    probabilities: 0.846 0.154 
## 
## Node number 15: 810 observations
##   predicted class=1  expected loss=0.06296296  P(node) =0.2515528
##     class counts:    51   759
##    probabilities: 0.063 0.937 
## 
## Node number 24: 336 observations,    complexity param=0.01025237
##   predicted class=0  expected loss=0.202381  P(node) =0.1043478
##     class counts:   268    68
##    probabilities: 0.798 0.202 
##   left son=48 (313 obs) right son=49 (23 obs)
##   Primary splits:
##       X0.3   < 0.09   to the left,  improve=16.624540, (0 missing)
##       X0.11  < 0.07   to the left,  improve=12.322880, (0 missing)
##       X3.756 < 3.5755 to the left,  improve=11.766710, (0 missing)
##       X0.4   < 0.08   to the left,  improve= 8.852514, (0 missing)
##       X0.32  < 0.66   to the left,  improve= 8.530257, (0 missing)
##   Surrogate splits:
##       X0.12 < 0.56   to the left,  agree=0.938, adj=0.087, (0 split)
## 
## Node number 25: 50 observations
##   predicted class=1  expected loss=0.22  P(node) =0.01552795
##     class counts:    11    39
##    probabilities: 0.220 0.780 
## 
## Node number 48: 313 observations
##   predicted class=0  expected loss=0.1597444  P(node) =0.09720497
##     class counts:   263    50
##    probabilities: 0.840 0.160 
## 
## Node number 49: 23 observations
##   predicted class=1  expected loss=0.2173913  P(node) =0.007142857
##     class counts:     5    18
##    probabilities: 0.217 0.783
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)
y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$X1))
y1[y[,1]>0.5] <- 0
y1[!(y[,1]>0.5)] <- 1
table(y1,dataTE$X1)
##    
## y1    0   1
##   0 777  94
##   1  59 450
tree1=rpart(dataTR$X1~.,method="class",data=dataTR,maxdepth=9,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$X1 ~ ., data = dataTR, method = "class", 
##     maxdepth = 9, xval = 10)
##   n= 3220 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.47949527      0 1.0000000 1.0000000 0.02186514
## 2 0.07334385      1 0.5205047 0.5339117 0.01823561
## 3 0.06782334      2 0.4471609 0.4526814 0.01712788
## 4 0.02208202      4 0.3115142 0.3391167 0.01522261
## 5 0.01813880      5 0.2894322 0.3209779 0.01487082
## 6 0.01419558      6 0.2712934 0.3020505 0.01448711
## 7 0.01025237      7 0.2570978 0.2870662 0.01417042
## 8 0.01000000      8 0.2468454 0.2791798 0.01399886
## 
## Variable importance
##  X0.778     X61   X0.44 X0.32.1   X0.96    X0.3 X0.64.1  X3.756    X278   X0.15 
##      24      14      11      10       9       8       6       5       5       2 
##   X0.32   X0.16   X0.14   X0.17 
##       2       1       1       1 
## 
## Node number 1: 3220 observations,    complexity param=0.4794953
##   predicted class=0  expected loss=0.3937888  P(node) =1
##     class counts:  1952  1268
##    probabilities: 0.606 0.394 
##   left son=2 (1858 obs) right son=3 (1362 obs)
##   Primary splits:
##       X0.778  < 0.0785 to the left,  improve=512.2678, (0 missing)
##       X0.44   < 0.0555 to the left,  improve=499.2630, (0 missing)
##       X0.3    < 0.01   to the left,  improve=434.2978, (0 missing)
##       X0.32.1 < 0.095  to the left,  improve=404.5992, (0 missing)
##       X0.96   < 0.405  to the left,  improve=398.9498, (0 missing)
##   Surrogate splits:
##       X0.44   < 0.0455 to the left,  agree=0.707, adj=0.308, (0 split)
##       X0.32.1 < 0.095  to the left,  agree=0.706, adj=0.304, (0 split)
##       X0.96   < 0.465  to the left,  agree=0.704, adj=0.300, (0 split)
##       X61     < 47.5   to the left,  agree=0.689, adj=0.264, (0 split)
##       X0.64.1 < 0.265  to the left,  agree=0.684, adj=0.252, (0 split)
## 
## Node number 2: 1858 observations,    complexity param=0.07334385
##   predicted class=0  expected loss=0.1523143  P(node) =0.5770186
##     class counts:  1575   283
##    probabilities: 0.848 0.152 
##   left son=4 (1721 obs) right son=5 (137 obs)
##   Primary splits:
##       X0.3    < 0.02   to the left,  improve=139.65530, (0 missing)
##       X0.44   < 0.081  to the left,  improve=100.73830, (0 missing)
##       X0.15   < 0.01   to the left,  improve= 91.18909, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve= 67.22638, (0 missing)
##       X0.96   < 0.985  to the left,  improve= 48.45695, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.485  to the left,  agree=0.930, adj=0.051, (0 split)
##       X0.45 < 1.063  to the left,  agree=0.929, adj=0.044, (0 split)
##       X0.12 < 2.375  to the left,  agree=0.927, adj=0.015, (0 split)
##       X0.11 < 2.14   to the left,  agree=0.927, adj=0.007, (0 split)
##       X0.13 < 8.945  to the left,  agree=0.927, adj=0.007, (0 split)
## 
## Node number 3: 1362 observations,    complexity param=0.06782334
##   predicted class=1  expected loss=0.2767988  P(node) =0.4229814
##     class counts:   377   985
##    probabilities: 0.277 0.723 
##   left son=6 (526 obs) right son=7 (836 obs)
##   Primary splits:
##       X61    < 18.5   to the left,  improve=155.4341, (0 missing)
##       X3.756 < 2.292  to the left,  improve=154.9768, (0 missing)
##       X0.44  < 0.0065 to the left,  improve=140.6588, (0 missing)
##       X278   < 81.5   to the left,  improve=137.7187, (0 missing)
##       X0.96  < 0.395  to the left,  improve=121.1891, (0 missing)
##   Surrogate splits:
##       X3.756 < 2.323  to the left,  agree=0.874, adj=0.675, (0 split)
##       X278   < 105.5  to the left,  agree=0.858, adj=0.631, (0 split)
##       X0.96  < 0.15   to the left,  agree=0.726, adj=0.291, (0 split)
##       X0.44  < 0.0065 to the left,  agree=0.725, adj=0.287, (0 split)
##       X0.32  < 0.01   to the left,  agree=0.695, adj=0.211, (0 split)
## 
## Node number 4: 1721 observations,    complexity param=0.0181388
##   predicted class=0  expected loss=0.09761766  P(node) =0.534472
##     class counts:  1553   168
##    probabilities: 0.902 0.098 
##   left son=8 (1650 obs) right son=9 (71 obs)
##   Primary splits:
##       X0.15   < 0.01   to the left,  improve=47.17248, (0 missing)
##       X0.44   < 0.1675 to the left,  improve=44.56346, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve=22.88287, (0 missing)
##       X0.14   < 0.135  to the left,  improve=16.86471, (0 missing)
##       X0.96   < 3.815  to the left,  improve=15.98877, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.815  to the left,  agree=0.959, adj=0.014, (0 split)
##       X278  < 7241.5 to the left,  agree=0.959, adj=0.014, (0 split)
## 
## Node number 5: 137 observations
##   predicted class=1  expected loss=0.1605839  P(node) =0.04254658
##     class counts:    22   115
##    probabilities: 0.161 0.839 
## 
## Node number 6: 526 observations,    complexity param=0.06782334
##   predicted class=0  expected loss=0.4220532  P(node) =0.163354
##     class counts:   304   222
##    probabilities: 0.578 0.422 
##   left son=12 (386 obs) right son=13 (140 obs)
##   Primary splits:
##       X0.32.1 < 0.1    to the left,  improve=60.85818, (0 missing)
##       X0.3    < 0.09   to the left,  improve=46.85309, (0 missing)
##       X0.44   < 0.0125 to the left,  improve=45.41479, (0 missing)
##       X0.96   < 0.4    to the left,  improve=34.61798, (0 missing)
##       X0.14   < 0.17   to the left,  improve=31.18934, (0 missing)
##   Surrogate splits:
##       X0.3  < 0.19   to the left,  agree=0.764, adj=0.114, (0 split)
##       X0.7  < 0.3    to the left,  agree=0.760, adj=0.100, (0 split)
##       X0.15 < 0.175  to the left,  agree=0.753, adj=0.071, (0 split)
##       X0.14 < 0.065  to the left,  agree=0.747, adj=0.050, (0 split)
##       X0.4  < 1.22   to the left,  agree=0.743, adj=0.036, (0 split)
## 
## Node number 7: 836 observations,    complexity param=0.01419558
##   predicted class=1  expected loss=0.08732057  P(node) =0.2596273
##     class counts:    73   763
##    probabilities: 0.087 0.913 
##   left son=14 (26 obs) right son=15 (810 obs)
##   Primary splits:
##       X0.16 < 0.39   to the right, improve=30.90419, (0 missing)
##       X0.17 < 0.13   to the right, improve=22.47071, (0 missing)
##       X0.18 < 0.45   to the right, improve=18.56998, (0 missing)
##       X0.21 < 0.03   to the right, improve=16.87615, (0 missing)
##       X0.38 < 0.52   to the right, improve=15.15688, (0 missing)
##   Surrogate splits:
##       X0.17 < 0.215  to the right, agree=0.986, adj=0.538, (0 split)
##       X0.21 < 0.03   to the right, agree=0.978, adj=0.308, (0 split)
##       X0.18 < 0.45   to the right, agree=0.972, adj=0.115, (0 split)
##       X0.20 < 0.105  to the right, agree=0.972, adj=0.115, (0 split)
##       X0.19 < 0.705  to the right, agree=0.971, adj=0.077, (0 split)
## 
## Node number 8: 1650 observations
##   predicted class=0  expected loss=0.07333333  P(node) =0.5124224
##     class counts:  1529   121
##    probabilities: 0.927 0.073 
## 
## Node number 9: 71 observations
##   predicted class=1  expected loss=0.3380282  P(node) =0.02204969
##     class counts:    24    47
##    probabilities: 0.338 0.662 
## 
## Node number 12: 386 observations,    complexity param=0.02208202
##   predicted class=0  expected loss=0.2772021  P(node) =0.1198758
##     class counts:   279   107
##    probabilities: 0.723 0.277 
##   left son=24 (336 obs) right son=25 (50 obs)
##   Primary splits:
##       X0.44 < 0.1065 to the left,  improve=29.04257, (0 missing)
##       X0.3  < 0.24   to the left,  improve=22.60497, (0 missing)
##       X0.14 < 0.23   to the left,  improve=22.03941, (0 missing)
##       X0.96 < 0.615  to the left,  improve=20.87466, (0 missing)
##       X0.4  < 0.3    to the left,  improve=20.57106, (0 missing)
##   Surrogate splits:
##       X0.14 < 1.045  to the left,  agree=0.902, adj=0.24, (0 split)
##       X0.2  < 1.175  to the left,  agree=0.896, adj=0.20, (0 split)
##       X0.10 < 2.005  to the left,  agree=0.878, adj=0.06, (0 split)
##       X0.11 < 1.415  to the left,  agree=0.876, adj=0.04, (0 split)
##       X0.4  < 0.28   to the left,  agree=0.873, adj=0.02, (0 split)
## 
## Node number 13: 140 observations
##   predicted class=1  expected loss=0.1785714  P(node) =0.04347826
##     class counts:    25   115
##    probabilities: 0.179 0.821 
## 
## Node number 14: 26 observations
##   predicted class=0  expected loss=0.1538462  P(node) =0.008074534
##     class counts:    22     4
##    probabilities: 0.846 0.154 
## 
## Node number 15: 810 observations
##   predicted class=1  expected loss=0.06296296  P(node) =0.2515528
##     class counts:    51   759
##    probabilities: 0.063 0.937 
## 
## Node number 24: 336 observations,    complexity param=0.01025237
##   predicted class=0  expected loss=0.202381  P(node) =0.1043478
##     class counts:   268    68
##    probabilities: 0.798 0.202 
##   left son=48 (313 obs) right son=49 (23 obs)
##   Primary splits:
##       X0.3   < 0.09   to the left,  improve=16.624540, (0 missing)
##       X0.11  < 0.07   to the left,  improve=12.322880, (0 missing)
##       X3.756 < 3.5755 to the left,  improve=11.766710, (0 missing)
##       X0.4   < 0.08   to the left,  improve= 8.852514, (0 missing)
##       X0.32  < 0.66   to the left,  improve= 8.530257, (0 missing)
##   Surrogate splits:
##       X0.12 < 0.56   to the left,  agree=0.938, adj=0.087, (0 split)
## 
## Node number 25: 50 observations
##   predicted class=1  expected loss=0.22  P(node) =0.01552795
##     class counts:    11    39
##    probabilities: 0.220 0.780 
## 
## Node number 48: 313 observations
##   predicted class=0  expected loss=0.1597444  P(node) =0.09720497
##     class counts:   263    50
##    probabilities: 0.840 0.160 
## 
## Node number 49: 23 observations
##   predicted class=1  expected loss=0.2173913  P(node) =0.007142857
##     class counts:     5    18
##    probabilities: 0.217 0.783
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$X1))
y1[y[,1]>0.5] <- 0
y1[!(y[,1]>0.5)] <- 1
table(y1,dataTE$X1)
##    
## y1    0   1
##   0 777  94
##   1  59 450
tree1=rpart(dataTR$X1~.,method="class",data=dataTR,maxdepth=6,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$X1 ~ ., data = dataTR, method = "class", 
##     maxdepth = 6, xval = 10)
##   n= 3220 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.47949527      0 1.0000000 1.0000000 0.02186514
## 2 0.07334385      1 0.5205047 0.5354890 0.01825534
## 3 0.06782334      2 0.4471609 0.4487382 0.01706923
## 4 0.02208202      4 0.3115142 0.3359621 0.01516250
## 5 0.01813880      5 0.2894322 0.3272871 0.01499489
## 6 0.01419558      6 0.2712934 0.3036278 0.01451977
## 7 0.01025237      7 0.2570978 0.2768139 0.01394671
## 8 0.01000000      8 0.2468454 0.2728707 0.01385907
## 
## Variable importance
##  X0.778     X61   X0.44 X0.32.1   X0.96    X0.3 X0.64.1  X3.756    X278   X0.15 
##      24      14      11      10       9       8       6       5       5       2 
##   X0.32   X0.16   X0.14   X0.17 
##       2       1       1       1 
## 
## Node number 1: 3220 observations,    complexity param=0.4794953
##   predicted class=0  expected loss=0.3937888  P(node) =1
##     class counts:  1952  1268
##    probabilities: 0.606 0.394 
##   left son=2 (1858 obs) right son=3 (1362 obs)
##   Primary splits:
##       X0.778  < 0.0785 to the left,  improve=512.2678, (0 missing)
##       X0.44   < 0.0555 to the left,  improve=499.2630, (0 missing)
##       X0.3    < 0.01   to the left,  improve=434.2978, (0 missing)
##       X0.32.1 < 0.095  to the left,  improve=404.5992, (0 missing)
##       X0.96   < 0.405  to the left,  improve=398.9498, (0 missing)
##   Surrogate splits:
##       X0.44   < 0.0455 to the left,  agree=0.707, adj=0.308, (0 split)
##       X0.32.1 < 0.095  to the left,  agree=0.706, adj=0.304, (0 split)
##       X0.96   < 0.465  to the left,  agree=0.704, adj=0.300, (0 split)
##       X61     < 47.5   to the left,  agree=0.689, adj=0.264, (0 split)
##       X0.64.1 < 0.265  to the left,  agree=0.684, adj=0.252, (0 split)
## 
## Node number 2: 1858 observations,    complexity param=0.07334385
##   predicted class=0  expected loss=0.1523143  P(node) =0.5770186
##     class counts:  1575   283
##    probabilities: 0.848 0.152 
##   left son=4 (1721 obs) right son=5 (137 obs)
##   Primary splits:
##       X0.3    < 0.02   to the left,  improve=139.65530, (0 missing)
##       X0.44   < 0.081  to the left,  improve=100.73830, (0 missing)
##       X0.15   < 0.01   to the left,  improve= 91.18909, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve= 67.22638, (0 missing)
##       X0.96   < 0.985  to the left,  improve= 48.45695, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.485  to the left,  agree=0.930, adj=0.051, (0 split)
##       X0.45 < 1.063  to the left,  agree=0.929, adj=0.044, (0 split)
##       X0.12 < 2.375  to the left,  agree=0.927, adj=0.015, (0 split)
##       X0.11 < 2.14   to the left,  agree=0.927, adj=0.007, (0 split)
##       X0.13 < 8.945  to the left,  agree=0.927, adj=0.007, (0 split)
## 
## Node number 3: 1362 observations,    complexity param=0.06782334
##   predicted class=1  expected loss=0.2767988  P(node) =0.4229814
##     class counts:   377   985
##    probabilities: 0.277 0.723 
##   left son=6 (526 obs) right son=7 (836 obs)
##   Primary splits:
##       X61    < 18.5   to the left,  improve=155.4341, (0 missing)
##       X3.756 < 2.292  to the left,  improve=154.9768, (0 missing)
##       X0.44  < 0.0065 to the left,  improve=140.6588, (0 missing)
##       X278   < 81.5   to the left,  improve=137.7187, (0 missing)
##       X0.96  < 0.395  to the left,  improve=121.1891, (0 missing)
##   Surrogate splits:
##       X3.756 < 2.323  to the left,  agree=0.874, adj=0.675, (0 split)
##       X278   < 105.5  to the left,  agree=0.858, adj=0.631, (0 split)
##       X0.96  < 0.15   to the left,  agree=0.726, adj=0.291, (0 split)
##       X0.44  < 0.0065 to the left,  agree=0.725, adj=0.287, (0 split)
##       X0.32  < 0.01   to the left,  agree=0.695, adj=0.211, (0 split)
## 
## Node number 4: 1721 observations,    complexity param=0.0181388
##   predicted class=0  expected loss=0.09761766  P(node) =0.534472
##     class counts:  1553   168
##    probabilities: 0.902 0.098 
##   left son=8 (1650 obs) right son=9 (71 obs)
##   Primary splits:
##       X0.15   < 0.01   to the left,  improve=47.17248, (0 missing)
##       X0.44   < 0.1675 to the left,  improve=44.56346, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve=22.88287, (0 missing)
##       X0.14   < 0.135  to the left,  improve=16.86471, (0 missing)
##       X0.96   < 3.815  to the left,  improve=15.98877, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.815  to the left,  agree=0.959, adj=0.014, (0 split)
##       X278  < 7241.5 to the left,  agree=0.959, adj=0.014, (0 split)
## 
## Node number 5: 137 observations
##   predicted class=1  expected loss=0.1605839  P(node) =0.04254658
##     class counts:    22   115
##    probabilities: 0.161 0.839 
## 
## Node number 6: 526 observations,    complexity param=0.06782334
##   predicted class=0  expected loss=0.4220532  P(node) =0.163354
##     class counts:   304   222
##    probabilities: 0.578 0.422 
##   left son=12 (386 obs) right son=13 (140 obs)
##   Primary splits:
##       X0.32.1 < 0.1    to the left,  improve=60.85818, (0 missing)
##       X0.3    < 0.09   to the left,  improve=46.85309, (0 missing)
##       X0.44   < 0.0125 to the left,  improve=45.41479, (0 missing)
##       X0.96   < 0.4    to the left,  improve=34.61798, (0 missing)
##       X0.14   < 0.17   to the left,  improve=31.18934, (0 missing)
##   Surrogate splits:
##       X0.3  < 0.19   to the left,  agree=0.764, adj=0.114, (0 split)
##       X0.7  < 0.3    to the left,  agree=0.760, adj=0.100, (0 split)
##       X0.15 < 0.175  to the left,  agree=0.753, adj=0.071, (0 split)
##       X0.14 < 0.065  to the left,  agree=0.747, adj=0.050, (0 split)
##       X0.4  < 1.22   to the left,  agree=0.743, adj=0.036, (0 split)
## 
## Node number 7: 836 observations,    complexity param=0.01419558
##   predicted class=1  expected loss=0.08732057  P(node) =0.2596273
##     class counts:    73   763
##    probabilities: 0.087 0.913 
##   left son=14 (26 obs) right son=15 (810 obs)
##   Primary splits:
##       X0.16 < 0.39   to the right, improve=30.90419, (0 missing)
##       X0.17 < 0.13   to the right, improve=22.47071, (0 missing)
##       X0.18 < 0.45   to the right, improve=18.56998, (0 missing)
##       X0.21 < 0.03   to the right, improve=16.87615, (0 missing)
##       X0.38 < 0.52   to the right, improve=15.15688, (0 missing)
##   Surrogate splits:
##       X0.17 < 0.215  to the right, agree=0.986, adj=0.538, (0 split)
##       X0.21 < 0.03   to the right, agree=0.978, adj=0.308, (0 split)
##       X0.18 < 0.45   to the right, agree=0.972, adj=0.115, (0 split)
##       X0.20 < 0.105  to the right, agree=0.972, adj=0.115, (0 split)
##       X0.19 < 0.705  to the right, agree=0.971, adj=0.077, (0 split)
## 
## Node number 8: 1650 observations
##   predicted class=0  expected loss=0.07333333  P(node) =0.5124224
##     class counts:  1529   121
##    probabilities: 0.927 0.073 
## 
## Node number 9: 71 observations
##   predicted class=1  expected loss=0.3380282  P(node) =0.02204969
##     class counts:    24    47
##    probabilities: 0.338 0.662 
## 
## Node number 12: 386 observations,    complexity param=0.02208202
##   predicted class=0  expected loss=0.2772021  P(node) =0.1198758
##     class counts:   279   107
##    probabilities: 0.723 0.277 
##   left son=24 (336 obs) right son=25 (50 obs)
##   Primary splits:
##       X0.44 < 0.1065 to the left,  improve=29.04257, (0 missing)
##       X0.3  < 0.24   to the left,  improve=22.60497, (0 missing)
##       X0.14 < 0.23   to the left,  improve=22.03941, (0 missing)
##       X0.96 < 0.615  to the left,  improve=20.87466, (0 missing)
##       X0.4  < 0.3    to the left,  improve=20.57106, (0 missing)
##   Surrogate splits:
##       X0.14 < 1.045  to the left,  agree=0.902, adj=0.24, (0 split)
##       X0.2  < 1.175  to the left,  agree=0.896, adj=0.20, (0 split)
##       X0.10 < 2.005  to the left,  agree=0.878, adj=0.06, (0 split)
##       X0.11 < 1.415  to the left,  agree=0.876, adj=0.04, (0 split)
##       X0.4  < 0.28   to the left,  agree=0.873, adj=0.02, (0 split)
## 
## Node number 13: 140 observations
##   predicted class=1  expected loss=0.1785714  P(node) =0.04347826
##     class counts:    25   115
##    probabilities: 0.179 0.821 
## 
## Node number 14: 26 observations
##   predicted class=0  expected loss=0.1538462  P(node) =0.008074534
##     class counts:    22     4
##    probabilities: 0.846 0.154 
## 
## Node number 15: 810 observations
##   predicted class=1  expected loss=0.06296296  P(node) =0.2515528
##     class counts:    51   759
##    probabilities: 0.063 0.937 
## 
## Node number 24: 336 observations,    complexity param=0.01025237
##   predicted class=0  expected loss=0.202381  P(node) =0.1043478
##     class counts:   268    68
##    probabilities: 0.798 0.202 
##   left son=48 (313 obs) right son=49 (23 obs)
##   Primary splits:
##       X0.3   < 0.09   to the left,  improve=16.624540, (0 missing)
##       X0.11  < 0.07   to the left,  improve=12.322880, (0 missing)
##       X3.756 < 3.5755 to the left,  improve=11.766710, (0 missing)
##       X0.4   < 0.08   to the left,  improve= 8.852514, (0 missing)
##       X0.32  < 0.66   to the left,  improve= 8.530257, (0 missing)
##   Surrogate splits:
##       X0.12 < 0.56   to the left,  agree=0.938, adj=0.087, (0 split)
## 
## Node number 25: 50 observations
##   predicted class=1  expected loss=0.22  P(node) =0.01552795
##     class counts:    11    39
##    probabilities: 0.220 0.780 
## 
## Node number 48: 313 observations
##   predicted class=0  expected loss=0.1597444  P(node) =0.09720497
##     class counts:   263    50
##    probabilities: 0.840 0.160 
## 
## Node number 49: 23 observations
##   predicted class=1  expected loss=0.2173913  P(node) =0.007142857
##     class counts:     5    18
##    probabilities: 0.217 0.783
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)
y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$X1))
y1[y[,1]>0.5] <- 0
y1[!(y[,1]>0.5)] <- 1
table(y1,dataTE$X1)
##    
## y1    0   1
##   0 777  94
##   1  59 450
tree1=rpart(dataTR$X1~.,method="class",data=dataTR,maxdepth=3,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$X1 ~ ., data = dataTR, method = "class", 
##     maxdepth = 3, xval = 10)
##   n= 3220 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.47949527      0 1.0000000 1.0000000 0.02186514
## 2 0.07334385      1 0.5205047 0.5244479 0.01811587
## 3 0.06782334      2 0.4471609 0.4179811 0.01659460
## 4 0.01813880      4 0.3115142 0.3312303 0.01507150
## 5 0.01419558      5 0.2933754 0.3130915 0.01471308
## 6 0.01000000      6 0.2791798 0.3012618 0.01447074
## 
## Variable importance
##  X0.778     X61 X0.32.1   X0.44   X0.96    X0.3 X0.64.1  X3.756    X278   X0.15 
##      25      14      10      10      10       7       6       5       5       2 
##   X0.32   X0.16   X0.17   X0.14 
##       2       1       1       1 
## 
## Node number 1: 3220 observations,    complexity param=0.4794953
##   predicted class=0  expected loss=0.3937888  P(node) =1
##     class counts:  1952  1268
##    probabilities: 0.606 0.394 
##   left son=2 (1858 obs) right son=3 (1362 obs)
##   Primary splits:
##       X0.778  < 0.0785 to the left,  improve=512.2678, (0 missing)
##       X0.44   < 0.0555 to the left,  improve=499.2630, (0 missing)
##       X0.3    < 0.01   to the left,  improve=434.2978, (0 missing)
##       X0.32.1 < 0.095  to the left,  improve=404.5992, (0 missing)
##       X0.96   < 0.405  to the left,  improve=398.9498, (0 missing)
##   Surrogate splits:
##       X0.44   < 0.0455 to the left,  agree=0.707, adj=0.308, (0 split)
##       X0.32.1 < 0.095  to the left,  agree=0.706, adj=0.304, (0 split)
##       X0.96   < 0.465  to the left,  agree=0.704, adj=0.300, (0 split)
##       X61     < 47.5   to the left,  agree=0.689, adj=0.264, (0 split)
##       X0.64.1 < 0.265  to the left,  agree=0.684, adj=0.252, (0 split)
## 
## Node number 2: 1858 observations,    complexity param=0.07334385
##   predicted class=0  expected loss=0.1523143  P(node) =0.5770186
##     class counts:  1575   283
##    probabilities: 0.848 0.152 
##   left son=4 (1721 obs) right son=5 (137 obs)
##   Primary splits:
##       X0.3    < 0.02   to the left,  improve=139.65530, (0 missing)
##       X0.44   < 0.081  to the left,  improve=100.73830, (0 missing)
##       X0.15   < 0.01   to the left,  improve= 91.18909, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve= 67.22638, (0 missing)
##       X0.96   < 0.985  to the left,  improve= 48.45695, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.485  to the left,  agree=0.930, adj=0.051, (0 split)
##       X0.45 < 1.063  to the left,  agree=0.929, adj=0.044, (0 split)
##       X0.12 < 2.375  to the left,  agree=0.927, adj=0.015, (0 split)
##       X0.11 < 2.14   to the left,  agree=0.927, adj=0.007, (0 split)
##       X0.13 < 8.945  to the left,  agree=0.927, adj=0.007, (0 split)
## 
## Node number 3: 1362 observations,    complexity param=0.06782334
##   predicted class=1  expected loss=0.2767988  P(node) =0.4229814
##     class counts:   377   985
##    probabilities: 0.277 0.723 
##   left son=6 (526 obs) right son=7 (836 obs)
##   Primary splits:
##       X61    < 18.5   to the left,  improve=155.4341, (0 missing)
##       X3.756 < 2.292  to the left,  improve=154.9768, (0 missing)
##       X0.44  < 0.0065 to the left,  improve=140.6588, (0 missing)
##       X278   < 81.5   to the left,  improve=137.7187, (0 missing)
##       X0.96  < 0.395  to the left,  improve=121.1891, (0 missing)
##   Surrogate splits:
##       X3.756 < 2.323  to the left,  agree=0.874, adj=0.675, (0 split)
##       X278   < 105.5  to the left,  agree=0.858, adj=0.631, (0 split)
##       X0.96  < 0.15   to the left,  agree=0.726, adj=0.291, (0 split)
##       X0.44  < 0.0065 to the left,  agree=0.725, adj=0.287, (0 split)
##       X0.32  < 0.01   to the left,  agree=0.695, adj=0.211, (0 split)
## 
## Node number 4: 1721 observations,    complexity param=0.0181388
##   predicted class=0  expected loss=0.09761766  P(node) =0.534472
##     class counts:  1553   168
##    probabilities: 0.902 0.098 
##   left son=8 (1650 obs) right son=9 (71 obs)
##   Primary splits:
##       X0.15   < 0.01   to the left,  improve=47.17248, (0 missing)
##       X0.44   < 0.1675 to the left,  improve=44.56346, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve=22.88287, (0 missing)
##       X0.14   < 0.135  to the left,  improve=16.86471, (0 missing)
##       X0.96   < 3.815  to the left,  improve=15.98877, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.815  to the left,  agree=0.959, adj=0.014, (0 split)
##       X278  < 7241.5 to the left,  agree=0.959, adj=0.014, (0 split)
## 
## Node number 5: 137 observations
##   predicted class=1  expected loss=0.1605839  P(node) =0.04254658
##     class counts:    22   115
##    probabilities: 0.161 0.839 
## 
## Node number 6: 526 observations,    complexity param=0.06782334
##   predicted class=0  expected loss=0.4220532  P(node) =0.163354
##     class counts:   304   222
##    probabilities: 0.578 0.422 
##   left son=12 (386 obs) right son=13 (140 obs)
##   Primary splits:
##       X0.32.1 < 0.1    to the left,  improve=60.85818, (0 missing)
##       X0.3    < 0.09   to the left,  improve=46.85309, (0 missing)
##       X0.44   < 0.0125 to the left,  improve=45.41479, (0 missing)
##       X0.96   < 0.4    to the left,  improve=34.61798, (0 missing)
##       X0.14   < 0.17   to the left,  improve=31.18934, (0 missing)
##   Surrogate splits:
##       X0.3  < 0.19   to the left,  agree=0.764, adj=0.114, (0 split)
##       X0.7  < 0.3    to the left,  agree=0.760, adj=0.100, (0 split)
##       X0.15 < 0.175  to the left,  agree=0.753, adj=0.071, (0 split)
##       X0.14 < 0.065  to the left,  agree=0.747, adj=0.050, (0 split)
##       X0.4  < 1.22   to the left,  agree=0.743, adj=0.036, (0 split)
## 
## Node number 7: 836 observations,    complexity param=0.01419558
##   predicted class=1  expected loss=0.08732057  P(node) =0.2596273
##     class counts:    73   763
##    probabilities: 0.087 0.913 
##   left son=14 (26 obs) right son=15 (810 obs)
##   Primary splits:
##       X0.16 < 0.39   to the right, improve=30.90419, (0 missing)
##       X0.17 < 0.13   to the right, improve=22.47071, (0 missing)
##       X0.18 < 0.45   to the right, improve=18.56998, (0 missing)
##       X0.21 < 0.03   to the right, improve=16.87615, (0 missing)
##       X0.38 < 0.52   to the right, improve=15.15688, (0 missing)
##   Surrogate splits:
##       X0.17 < 0.215  to the right, agree=0.986, adj=0.538, (0 split)
##       X0.21 < 0.03   to the right, agree=0.978, adj=0.308, (0 split)
##       X0.18 < 0.45   to the right, agree=0.972, adj=0.115, (0 split)
##       X0.20 < 0.105  to the right, agree=0.972, adj=0.115, (0 split)
##       X0.19 < 0.705  to the right, agree=0.971, adj=0.077, (0 split)
## 
## Node number 8: 1650 observations
##   predicted class=0  expected loss=0.07333333  P(node) =0.5124224
##     class counts:  1529   121
##    probabilities: 0.927 0.073 
## 
## Node number 9: 71 observations
##   predicted class=1  expected loss=0.3380282  P(node) =0.02204969
##     class counts:    24    47
##    probabilities: 0.338 0.662 
## 
## Node number 12: 386 observations
##   predicted class=0  expected loss=0.2772021  P(node) =0.1198758
##     class counts:   279   107
##    probabilities: 0.723 0.277 
## 
## Node number 13: 140 observations
##   predicted class=1  expected loss=0.1785714  P(node) =0.04347826
##     class counts:    25   115
##    probabilities: 0.179 0.821 
## 
## Node number 14: 26 observations
##   predicted class=0  expected loss=0.1538462  P(node) =0.008074534
##     class counts:    22     4
##    probabilities: 0.846 0.154 
## 
## Node number 15: 810 observations
##   predicted class=1  expected loss=0.06296296  P(node) =0.2515528
##     class counts:    51   759
##    probabilities: 0.063 0.937
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$X1))
y1[y[,1]>0.5] <- 0
y1[!(y[,1]>0.5)] <- 1
table(y1,dataTE$X1)
##    
## y1    0   1
##   0 780 125
##   1  56 419
tree1=rpart(dataTR$X1~.,method="class",data=dataTR,maxdepth=14,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$X1 ~ ., data = dataTR, method = "class", 
##     maxdepth = 14, xval = 10)
##   n= 3220 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.47949527      0 1.0000000 1.0000000 0.02186514
## 2 0.07334385      1 0.5205047 0.5339117 0.01823561
## 3 0.06782334      2 0.4471609 0.4629338 0.01727815
## 4 0.02208202      4 0.3115142 0.3249211 0.01494858
## 5 0.01813880      5 0.2894322 0.3162461 0.01477653
## 6 0.01419558      6 0.2712934 0.2854890 0.01413638
## 7 0.01025237      7 0.2570978 0.2657729 0.01369904
## 8 0.01000000      8 0.2468454 0.2618297 0.01360882
## 
## Variable importance
##  X0.778     X61   X0.44 X0.32.1   X0.96    X0.3 X0.64.1  X3.756    X278   X0.15 
##      24      14      11      10       9       8       6       5       5       2 
##   X0.32   X0.16   X0.14   X0.17 
##       2       1       1       1 
## 
## Node number 1: 3220 observations,    complexity param=0.4794953
##   predicted class=0  expected loss=0.3937888  P(node) =1
##     class counts:  1952  1268
##    probabilities: 0.606 0.394 
##   left son=2 (1858 obs) right son=3 (1362 obs)
##   Primary splits:
##       X0.778  < 0.0785 to the left,  improve=512.2678, (0 missing)
##       X0.44   < 0.0555 to the left,  improve=499.2630, (0 missing)
##       X0.3    < 0.01   to the left,  improve=434.2978, (0 missing)
##       X0.32.1 < 0.095  to the left,  improve=404.5992, (0 missing)
##       X0.96   < 0.405  to the left,  improve=398.9498, (0 missing)
##   Surrogate splits:
##       X0.44   < 0.0455 to the left,  agree=0.707, adj=0.308, (0 split)
##       X0.32.1 < 0.095  to the left,  agree=0.706, adj=0.304, (0 split)
##       X0.96   < 0.465  to the left,  agree=0.704, adj=0.300, (0 split)
##       X61     < 47.5   to the left,  agree=0.689, adj=0.264, (0 split)
##       X0.64.1 < 0.265  to the left,  agree=0.684, adj=0.252, (0 split)
## 
## Node number 2: 1858 observations,    complexity param=0.07334385
##   predicted class=0  expected loss=0.1523143  P(node) =0.5770186
##     class counts:  1575   283
##    probabilities: 0.848 0.152 
##   left son=4 (1721 obs) right son=5 (137 obs)
##   Primary splits:
##       X0.3    < 0.02   to the left,  improve=139.65530, (0 missing)
##       X0.44   < 0.081  to the left,  improve=100.73830, (0 missing)
##       X0.15   < 0.01   to the left,  improve= 91.18909, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve= 67.22638, (0 missing)
##       X0.96   < 0.985  to the left,  improve= 48.45695, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.485  to the left,  agree=0.930, adj=0.051, (0 split)
##       X0.45 < 1.063  to the left,  agree=0.929, adj=0.044, (0 split)
##       X0.12 < 2.375  to the left,  agree=0.927, adj=0.015, (0 split)
##       X0.11 < 2.14   to the left,  agree=0.927, adj=0.007, (0 split)
##       X0.13 < 8.945  to the left,  agree=0.927, adj=0.007, (0 split)
## 
## Node number 3: 1362 observations,    complexity param=0.06782334
##   predicted class=1  expected loss=0.2767988  P(node) =0.4229814
##     class counts:   377   985
##    probabilities: 0.277 0.723 
##   left son=6 (526 obs) right son=7 (836 obs)
##   Primary splits:
##       X61    < 18.5   to the left,  improve=155.4341, (0 missing)
##       X3.756 < 2.292  to the left,  improve=154.9768, (0 missing)
##       X0.44  < 0.0065 to the left,  improve=140.6588, (0 missing)
##       X278   < 81.5   to the left,  improve=137.7187, (0 missing)
##       X0.96  < 0.395  to the left,  improve=121.1891, (0 missing)
##   Surrogate splits:
##       X3.756 < 2.323  to the left,  agree=0.874, adj=0.675, (0 split)
##       X278   < 105.5  to the left,  agree=0.858, adj=0.631, (0 split)
##       X0.96  < 0.15   to the left,  agree=0.726, adj=0.291, (0 split)
##       X0.44  < 0.0065 to the left,  agree=0.725, adj=0.287, (0 split)
##       X0.32  < 0.01   to the left,  agree=0.695, adj=0.211, (0 split)
## 
## Node number 4: 1721 observations,    complexity param=0.0181388
##   predicted class=0  expected loss=0.09761766  P(node) =0.534472
##     class counts:  1553   168
##    probabilities: 0.902 0.098 
##   left son=8 (1650 obs) right son=9 (71 obs)
##   Primary splits:
##       X0.15   < 0.01   to the left,  improve=47.17248, (0 missing)
##       X0.44   < 0.1675 to the left,  improve=44.56346, (0 missing)
##       X0.32.1 < 0.135  to the left,  improve=22.88287, (0 missing)
##       X0.14   < 0.135  to the left,  improve=16.86471, (0 missing)
##       X0.96   < 3.815  to the left,  improve=15.98877, (0 missing)
##   Surrogate splits:
##       X0.14 < 0.815  to the left,  agree=0.959, adj=0.014, (0 split)
##       X278  < 7241.5 to the left,  agree=0.959, adj=0.014, (0 split)
## 
## Node number 5: 137 observations
##   predicted class=1  expected loss=0.1605839  P(node) =0.04254658
##     class counts:    22   115
##    probabilities: 0.161 0.839 
## 
## Node number 6: 526 observations,    complexity param=0.06782334
##   predicted class=0  expected loss=0.4220532  P(node) =0.163354
##     class counts:   304   222
##    probabilities: 0.578 0.422 
##   left son=12 (386 obs) right son=13 (140 obs)
##   Primary splits:
##       X0.32.1 < 0.1    to the left,  improve=60.85818, (0 missing)
##       X0.3    < 0.09   to the left,  improve=46.85309, (0 missing)
##       X0.44   < 0.0125 to the left,  improve=45.41479, (0 missing)
##       X0.96   < 0.4    to the left,  improve=34.61798, (0 missing)
##       X0.14   < 0.17   to the left,  improve=31.18934, (0 missing)
##   Surrogate splits:
##       X0.3  < 0.19   to the left,  agree=0.764, adj=0.114, (0 split)
##       X0.7  < 0.3    to the left,  agree=0.760, adj=0.100, (0 split)
##       X0.15 < 0.175  to the left,  agree=0.753, adj=0.071, (0 split)
##       X0.14 < 0.065  to the left,  agree=0.747, adj=0.050, (0 split)
##       X0.4  < 1.22   to the left,  agree=0.743, adj=0.036, (0 split)
## 
## Node number 7: 836 observations,    complexity param=0.01419558
##   predicted class=1  expected loss=0.08732057  P(node) =0.2596273
##     class counts:    73   763
##    probabilities: 0.087 0.913 
##   left son=14 (26 obs) right son=15 (810 obs)
##   Primary splits:
##       X0.16 < 0.39   to the right, improve=30.90419, (0 missing)
##       X0.17 < 0.13   to the right, improve=22.47071, (0 missing)
##       X0.18 < 0.45   to the right, improve=18.56998, (0 missing)
##       X0.21 < 0.03   to the right, improve=16.87615, (0 missing)
##       X0.38 < 0.52   to the right, improve=15.15688, (0 missing)
##   Surrogate splits:
##       X0.17 < 0.215  to the right, agree=0.986, adj=0.538, (0 split)
##       X0.21 < 0.03   to the right, agree=0.978, adj=0.308, (0 split)
##       X0.18 < 0.45   to the right, agree=0.972, adj=0.115, (0 split)
##       X0.20 < 0.105  to the right, agree=0.972, adj=0.115, (0 split)
##       X0.19 < 0.705  to the right, agree=0.971, adj=0.077, (0 split)
## 
## Node number 8: 1650 observations
##   predicted class=0  expected loss=0.07333333  P(node) =0.5124224
##     class counts:  1529   121
##    probabilities: 0.927 0.073 
## 
## Node number 9: 71 observations
##   predicted class=1  expected loss=0.3380282  P(node) =0.02204969
##     class counts:    24    47
##    probabilities: 0.338 0.662 
## 
## Node number 12: 386 observations,    complexity param=0.02208202
##   predicted class=0  expected loss=0.2772021  P(node) =0.1198758
##     class counts:   279   107
##    probabilities: 0.723 0.277 
##   left son=24 (336 obs) right son=25 (50 obs)
##   Primary splits:
##       X0.44 < 0.1065 to the left,  improve=29.04257, (0 missing)
##       X0.3  < 0.24   to the left,  improve=22.60497, (0 missing)
##       X0.14 < 0.23   to the left,  improve=22.03941, (0 missing)
##       X0.96 < 0.615  to the left,  improve=20.87466, (0 missing)
##       X0.4  < 0.3    to the left,  improve=20.57106, (0 missing)
##   Surrogate splits:
##       X0.14 < 1.045  to the left,  agree=0.902, adj=0.24, (0 split)
##       X0.2  < 1.175  to the left,  agree=0.896, adj=0.20, (0 split)
##       X0.10 < 2.005  to the left,  agree=0.878, adj=0.06, (0 split)
##       X0.11 < 1.415  to the left,  agree=0.876, adj=0.04, (0 split)
##       X0.4  < 0.28   to the left,  agree=0.873, adj=0.02, (0 split)
## 
## Node number 13: 140 observations
##   predicted class=1  expected loss=0.1785714  P(node) =0.04347826
##     class counts:    25   115
##    probabilities: 0.179 0.821 
## 
## Node number 14: 26 observations
##   predicted class=0  expected loss=0.1538462  P(node) =0.008074534
##     class counts:    22     4
##    probabilities: 0.846 0.154 
## 
## Node number 15: 810 observations
##   predicted class=1  expected loss=0.06296296  P(node) =0.2515528
##     class counts:    51   759
##    probabilities: 0.063 0.937 
## 
## Node number 24: 336 observations,    complexity param=0.01025237
##   predicted class=0  expected loss=0.202381  P(node) =0.1043478
##     class counts:   268    68
##    probabilities: 0.798 0.202 
##   left son=48 (313 obs) right son=49 (23 obs)
##   Primary splits:
##       X0.3   < 0.09   to the left,  improve=16.624540, (0 missing)
##       X0.11  < 0.07   to the left,  improve=12.322880, (0 missing)
##       X3.756 < 3.5755 to the left,  improve=11.766710, (0 missing)
##       X0.4   < 0.08   to the left,  improve= 8.852514, (0 missing)
##       X0.32  < 0.66   to the left,  improve= 8.530257, (0 missing)
##   Surrogate splits:
##       X0.12 < 0.56   to the left,  agree=0.938, adj=0.087, (0 split)
## 
## Node number 25: 50 observations
##   predicted class=1  expected loss=0.22  P(node) =0.01552795
##     class counts:    11    39
##    probabilities: 0.220 0.780 
## 
## Node number 48: 313 observations
##   predicted class=0  expected loss=0.1597444  P(node) =0.09720497
##     class counts:   263    50
##    probabilities: 0.840 0.160 
## 
## Node number 49: 23 observations
##   predicted class=1  expected loss=0.2173913  P(node) =0.007142857
##     class counts:     5    18
##    probabilities: 0.217 0.783
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)

y1 <- rep(0,length(dataTE$X1))
y1[y[,1]>0.5] <- 0
y1[!(y[,1]>0.5)] <- 1
table(y1,dataTE$X1)
##    
## y1    0   1
##   0 777  94
##   1  59 450
#maxdepth=3 works best for us since the accuracy is maximum for maxdepth=3.

###RF

rf.spambase=randomForest(X1~.,data=dataTR,mtry=4,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.spambase)

varImpPlot(rf.spambase)

pred.spambase = predict(rf.spambase,newdata=dataTE)
table(pred.spambase,dataTE$X1)
##              
## pred.spambase   0   1
##             0 806  52
##             1  30 492
rf.spambase=randomForest(X1~.,data=dataTR,mtry=3,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.spambase)

varImpPlot(rf.spambase)

pred.spambase = predict(rf.spambase,newdata=dataTE)
table(pred.spambase,dataTE$X1)
##              
## pred.spambase   0   1
##             0 807  49
##             1  29 495
rf.spambase=randomForest(X1~.,data=dataTR,mtry=5,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.spambase)

varImpPlot(rf.spambase)

pred.spambase = predict(rf.spambase,newdata=dataTE)
table(pred.spambase,dataTE$X1)
##              
## pred.spambase   0   1
##             0 804  48
##             1  32 496
rf.spambase=randomForest(X1~.,data=dataTR,mtry=6,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.spambase)

varImpPlot(rf.spambase)

pred.spambase = predict(rf.spambase,newdata=dataTE)
table(pred.spambase,dataTE$X1)
##              
## pred.spambase   0   1
##             0 807  48
##             1  29 496
rf.spambase=randomForest(X1~.,data=dataTR,mtry=7,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.spambase)

varImpPlot(rf.spambase)

pred.spambase = predict(rf.spambase,newdata=dataTE)
table(pred.spambase,dataTE$X1)
##              
## pred.spambase   0   1
##             0 808  45
##             1  28 499
rf.spambase=randomForest(X1~.,data=dataTR,mtry=4,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.spambase)

varImpPlot(rf.spambase)

pred.spambase = predict(rf.spambase,newdata=dataTE)
table(pred.spambase,dataTE$X1)
##              
## pred.spambase   0   1
##             0 804  54
##             1  32 490
#mry=5,6,7 produced the same results and accuracy is the same on all three parameters.

###gbm

noftrees=100
depth=5
learning_rate=0.2
sampling_fraction=0.5


#boosting_model=gbm.fit(X1~.,distribution="bernoulli", data=dataTR, n.trees = noftrees,interaction.depth = depth,cv.folds=10,class.stratify.cv=TRUE, 
 #                  n.minobsinnode = 5, shrinkage =learning_rate,
#                   bag.fraction = sampling_fraction)
#boosting_model
#summary(boosting_model)

#r session aborts usually when a boosting approach is carried out.

##WEC_Perth_49 Dataset. This data set includes wave information with respect to x and y coordinates for 49 distinct locations. The goal is to form a prediction for the total power production from all 49 locations. All the information is numeric.

###WEC_Perth_49.csv
data <- data.table(read.csv("WEC_Perth_49.csv",stringsAsFactors=T))
str(data)
## Classes 'data.table' and 'data.frame':   36043 obs. of  149 variables:
##  $ X1         : num  600 593 593 593 200 600 600 400 800 800 ...
##  $ Y1         : num  0 12 12 12 0 0 0 0 0 0 ...
##  $ X2         : num  546 546 546 546 146 ...
##  $ Y2         : num  37.5 37.5 37.5 37.5 37.5 ...
##  $ X3         : num  489.8 489.8 489.8 489.8 89.8 ...
##  $ Y3         : num  74.9 74.9 74.9 74.9 74.9 ...
##  $ X4         : num  432.5 432.5 432.5 432.5 32.4 ...
##  $ Y4         : num  112 112 112 112 112 ...
##  $ X5         : num  650 644 644 644 400 200 800 600 1000 1000 ...
##  $ Y5         : num  0 8 8 8 0 200 0 0 0 0 ...
##  $ X6         : num  700 700 697 697 346 ...
##  $ Y6         : num  0 0 3 3 37.5 ...
##  $ X7         : num  750 750 750 750 290 ...
##  $ Y7         : num  0 0 0 0 74.9 ...
##  $ X8         : num  800 800 800 800 232 ...
##  $ Y8         : num  0 0 0 0 112 ...
##  $ X9         : num  850 850 850 850 800 600 1000 800 400 400 ...
##  $ Y9         : num  0 0 0 0 0 200 0 0 200 200 ...
##  $ X10        : num  900 900 900 900 746 ...
##  $ Y10        : num  0 0 0 0 37.5 ...
##  $ X11        : num  950 950 950 950 690 ...
##  $ Y11        : num  0 0 0 0 74.9 ...
##  $ X12        : num  1000 1000 1000 1000 632 ...
##  $ Y12        : num  0 0 0 0 112 ...
##  $ X13        : num  1000 1000 1000 1000 1000 400 200 1000 1000 800 ...
##  $ Y13        : num  200 200 200 200 0 400 200 0 200 200 ...
##  $ X14        : num  946 946 946 946 946 ...
##  $ Y14        : num  237.5 237.5 237.5 237.5 37.5 ...
##  $ X15        : num  890 890 890 890 890 ...
##  $ Y15        : num  274.9 274.9 274.9 274.9 74.9 ...
##  $ X16        : num  832 832 832 832 832 ...
##  $ Y16        : num  312 312 312 312 112 ...
##  $ X17        : num  200 200 200 200 200 800 800 600 200 1000 ...
##  $ Y17        : num  400 400 400 400 200 400 400 200 400 200 ...
##  $ X18        : num  146 146 146 146 146 ...
##  $ Y18        : num  438 438 438 438 238 ...
##  $ X19        : num  89.8 89.8 89.8 89.8 89.8 ...
##  $ Y19        : num  475 475 475 475 275 ...
##  $ X20        : num  0 0 0 0 32.4 ...
##  $ Y20        : num  612 612 612 612 312 ...
##  $ X21        : num  400 400 400 400 600 1000 1000 800 400 200 ...
##  $ Y21        : num  400 400 400 400 200 400 400 200 400 400 ...
##  $ X22        : num  346 346 346 346 546 ...
##  $ Y22        : num  438 438 438 438 238 ...
##  $ X23        : num  290 290 290 290 490 ...
##  $ Y23        : num  475 475 475 475 275 ...
##  $ X24        : num  232 232 232 251 432 ...
##  $ Y24        : num  512 512 512 511 312 ...
##  $ X25        : num  600 600 600 600 400 400 600 200 200 200 ...
##  $ Y25        : num  400 400 400 400 400 600 600 400 600 600 ...
##  $ X26        : num  546 546 546 546 346 ...
##  $ Y26        : num  438 438 438 438 438 ...
##  $ X27        : num  490 490 490 490 290 ...
##  $ Y27        : num  475 475 475 475 475 ...
##  $ X28        : num  432 432 432 432 232 ...
##  $ Y28        : num  512 512 512 512 512 ...
##  $ X29        : num  800 800 800 800 800 800 800 600 800 600 ...
##  $ Y29        : num  400 400 400 400 400 600 600 600 600 600 ...
##  $ X30        : num  746 746 746 746 746 ...
##  $ Y30        : num  438 438 438 438 438 ...
##  $ X31        : num  690 690 690 690 690 ...
##  $ Y31        : num  475 475 475 475 475 ...
##  $ X32        : num  632 632 632 632 632 ...
##  $ Y32        : num  512 512 512 512 512 ...
##  $ X33        : num  200 197 197 197 400 1000 1000 800 1000 800 ...
##  $ Y33        : num  600 559 559 559 600 600 600 600 600 600 ...
##  $ X34        : num  146 146 146 146 346 ...
##  $ Y34        : num  638 638 638 638 638 ...
##  $ X35        : num  89.8 89.8 89.8 89.8 289.8 ...
##  $ Y35        : num  675 675 675 675 675 ...
##  $ X36        : num  0 0 0 0 232 ...
##  $ Y36        : num  762 762 762 762 712 ...
##  $ X37        : num  600 600 600 600 1000 200 200 200 200 200 ...
##  $ Y37        : num  600 600 600 600 600 800 800 800 800 800 ...
##  $ X38        : num  546 546 546 546 946 ...
##  $ Y38        : num  638 638 638 638 638 ...
##  $ X39        : num  490 490 490 490 890 ...
##  $ Y39        : num  675 675 675 675 675 ...
##  $ X40        : num  432 432 432 432 832 ...
##  $ Y40        : num  712 712 712 712 712 ...
##  $ X41        : num  200 204 204 204 600 400 600 400 400 400 ...
##  $ Y41        : num  800 807 807 807 800 800 800 800 800 800 ...
##  $ X42        : num  146 146 146 146 546 ...
##  $ Y42        : num  838 838 838 838 838 ...
##  $ X43        : num  89.8 89.8 89.8 89.8 489.8 ...
##  $ Y43        : num  875 875 875 875 875 ...
##  $ X44        : num  32.5 32.5 32.5 32.5 432.4 ...
##  $ Y44        : num  912 912 912 912 912 ...
##  $ X45        : num  400 400 400 400 800 600 800 800 800 800 ...
##  $ Y45        : num  800 800 800 800 800 800 800 800 800 800 ...
##  $ X46        : num  346 346 346 346 746 ...
##  $ Y46        : num  838 838 838 838 838 ...
##  $ X47        : num  290 290 290 290 690 ...
##  $ Y47        : num  875 875 875 875 875 ...
##  $ X48        : num  232 232 232 232 632 ...
##  $ Y48        : num  912 912 912 912 912 ...
##  $ X49        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Y49        : num  1010 1010 1010 1010 1010 1010 1010 1010 1010 1010 ...
##  $ Power1     : num  71265 72872 72724 72759 44620 ...
##   [list output truncated]
##  - attr(*, ".internal.selfref")=<externalptr>
data <- data[,-c(99:148)]
set.seed(582) #using caTools performing stratified sampling
split=sample.split(data$Total_Power, SplitRatio=0.7)
dataTR=subset(data,split==TRUE)
dataTE=subset(data,split==FALSE)
str(dataTR)
## Classes 'data.table' and 'data.frame':   26779 obs. of  99 variables:
##  $ X1         : num  593 593 593 200 600 600 400 800 800 800 ...
##  $ Y1         : num  12 12 12 0 0 0 0 0 0 0 ...
##  $ X2         : num  546 546 546 146 546 ...
##  $ Y2         : num  37.5 37.5 37.5 37.5 37.5 ...
##  $ X3         : num  489.8 489.8 489.8 89.8 489.8 ...
##  $ Y3         : num  74.9 74.9 74.9 74.9 74.9 ...
##  $ X4         : num  432.5 432.5 432.5 32.4 432.4 ...
##  $ Y4         : num  112 112 112 112 112 ...
##  $ X5         : num  644 644 644 400 200 800 600 1000 1000 1000 ...
##  $ Y5         : num  8 8 8 0 200 0 0 0 0 0 ...
##  $ X6         : num  700 697 697 346 146 ...
##  $ Y6         : num  0 3 3 37.5 237.5 ...
##  $ X7         : num  750 750 750 289.8 89.8 ...
##  $ Y7         : num  0 0 0 74.9 274.9 ...
##  $ X8         : num  800 800 800 232.4 32.4 ...
##  $ Y8         : num  0 0 0 112 312 ...
##  $ X9         : num  850 850 850 800 600 1000 800 400 400 400 ...
##  $ Y9         : num  0 0 0 0 200 0 0 200 200 200 ...
##  $ X10        : num  900 900 900 746 546 ...
##  $ Y10        : num  0 0 0 37.5 237.5 ...
##  $ X11        : num  950 950 950 690 490 ...
##  $ Y11        : num  0 0 0 74.9 274.9 ...
##  $ X12        : num  1000 1000 1000 632 432 ...
##  $ Y12        : num  0 0 0 112 312 ...
##  $ X13        : num  1000 1000 1000 1000 400 200 1000 1000 800 800 ...
##  $ Y13        : num  200 200 200 0 400 200 0 200 200 200 ...
##  $ X14        : num  946 946 946 946 346 ...
##  $ Y14        : num  237.5 237.5 237.5 37.5 437.5 ...
##  $ X15        : num  890 890 890 890 290 ...
##  $ Y15        : num  274.9 274.9 274.9 74.9 474.9 ...
##  $ X16        : num  832 832 832 832 232 ...
##  $ Y16        : num  312 312 312 112 512 ...
##  $ X17        : num  200 200 200 200 800 800 600 200 1000 1000 ...
##  $ Y17        : num  400 400 400 200 400 400 200 400 200 200 ...
##  $ X18        : num  146 146 146 146 746 ...
##  $ Y18        : num  438 438 438 238 438 ...
##  $ X19        : num  89.8 89.8 89.8 89.8 689.8 ...
##  $ Y19        : num  475 475 475 275 475 ...
##  $ X20        : num  0 0 0 32.4 632.4 ...
##  $ Y20        : num  612 612 612 312 512 ...
##  $ X21        : num  400 400 400 600 1000 1000 800 400 200 200 ...
##  $ Y21        : num  400 400 400 200 400 400 200 400 400 600 ...
##  $ X22        : num  346 346 346 546 946 ...
##  $ Y22        : num  438 438 438 238 438 ...
##  $ X23        : num  290 290 290 490 890 ...
##  $ Y23        : num  475 475 475 275 475 ...
##  $ X24        : num  232 232 251 432 832 ...
##  $ Y24        : num  512 512 511 312 512 ...
##  $ X25        : num  600 600 600 400 400 600 200 200 200 400 ...
##  $ Y25        : num  400 400 400 400 600 600 400 600 600 600 ...
##  $ X26        : num  546 546 546 346 346 ...
##  $ Y26        : num  438 438 438 438 638 ...
##  $ X27        : num  490 490 490 290 290 ...
##  $ Y27        : num  475 475 475 475 675 ...
##  $ X28        : num  432 432 432 232 232 ...
##  $ Y28        : num  512 512 512 512 712 ...
##  $ X29        : num  800 800 800 800 800 800 600 800 600 600 ...
##  $ Y29        : num  400 400 400 400 600 600 600 600 600 600 ...
##  $ X30        : num  746 746 746 746 746 ...
##  $ Y30        : num  438 438 438 438 638 ...
##  $ X31        : num  690 690 690 690 690 ...
##  $ Y31        : num  475 475 475 475 675 ...
##  $ X32        : num  632 632 632 632 632 ...
##  $ Y32        : num  512 512 512 512 712 ...
##  $ X33        : num  197 197 197 400 1000 1000 800 1000 800 800 ...
##  $ Y33        : num  559 559 559 600 600 600 600 600 600 600 ...
##  $ X34        : num  146 146 146 346 946 ...
##  $ Y34        : num  638 638 638 638 638 ...
##  $ X35        : num  89.8 89.8 89.8 289.8 889.8 ...
##  $ Y35        : num  675 675 675 675 675 ...
##  $ X36        : num  0 0 0 232 832 ...
##  $ Y36        : num  762 762 762 712 712 ...
##  $ X37        : num  600 600 600 1000 200 200 200 200 200 200 ...
##  $ Y37        : num  600 600 600 600 800 800 800 800 800 800 ...
##  $ X38        : num  546 546 546 946 146 ...
##  $ Y38        : num  638 638 638 638 838 ...
##  $ X39        : num  489.8 489.8 489.8 889.8 89.8 ...
##  $ Y39        : num  675 675 675 675 875 ...
##  $ X40        : num  432.5 432.5 432.5 832.4 32.4 ...
##  $ Y40        : num  712 712 712 712 912 ...
##  $ X41        : num  204 204 204 600 400 600 400 400 400 400 ...
##  $ Y41        : num  807 807 807 800 800 800 800 800 800 800 ...
##  $ X42        : num  146 146 146 546 346 ...
##  $ Y42        : num  838 838 838 838 838 ...
##  $ X43        : num  89.8 89.8 89.8 489.8 289.8 ...
##  $ Y43        : num  875 875 875 875 875 ...
##  $ X44        : num  32.5 32.5 32.5 432.4 232.4 ...
##  $ Y44        : num  912 912 912 912 912 ...
##  $ X45        : num  400 400 400 800 600 800 800 800 800 600 ...
##  $ Y45        : num  800 800 800 800 800 800 800 800 800 800 ...
##  $ X46        : num  346 346 346 746 546 ...
##  $ Y46        : num  838 838 838 838 838 ...
##  $ X47        : num  290 290 290 690 490 ...
##  $ Y47        : num  875 875 875 875 875 ...
##  $ X48        : num  232 232 232 632 432 ...
##  $ Y48        : num  912 912 912 912 912 ...
##  $ X49        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Y49        : num  1010 1010 1010 1010 1010 1010 1010 1010 1010 1010 ...
##  $ Total_Power: num  4103361 4103680 4105661 3752649 3820015 ...
##  - attr(*, ".internal.selfref")=<externalptr>
#knn
knnFit <- train(Total_Power~ ., data = dataTR, method = "knn", trControl = trainControl(method = "cv"),preProcess = c("center","scale"), tuneGrid = expand.grid(k=c(3,5,7,9,11)))
knnFit
## k-Nearest Neighbors 
## 
## 26779 samples
##    98 predictor
## 
## Pre-processing: centered (98), scaled (98) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 24101, 24102, 24100, 24100, 24102, 24101, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    3  41170.31  0.8889125  17430.47
##    5  41651.39  0.8868224  18279.44
##    7  42215.01  0.8843063  18924.80
##    9  43095.80  0.8797840  19589.06
##   11  43852.48  0.8757505  20177.32
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 3.
y <- predict(knnFit,newdata=dataTE) #it takes a lot of time to train since knn is a "lazy learner" and requires to check a lot of features because of the data set characteristics
y
##    [1] 4103168 4071353 3803601 3776276 3853590 3752097 3803601 3770150 3724490
##   [10] 3784544 3744664 3797295 3820930 3784544 3752387 3744664 3853672 3717702
##   [19] 3837898 3750862 3797295 3786952 3827248 3759527 3785089 3819342 3867383
##   [28] 3693528 3752387 3787482 3819342 3797295 3819342 3822251 3820930 3782815
##   [37] 3786573 3820930 3810664 3787482 3797295 3820930 3819342 3820930 3819342
##   [46] 3628615 3819342 3833252 3820930 3819342 3833252 3820930 3839010 3855819
##   [55] 3855819 3855819 3833252 3833252 3839263 3839010 3850110 3843231 3850110
##   [64] 3839263 3834191 3840014 3840014 3839690 3839690 3888885 3874980 3846880
##   [73] 3834191 3849424 3841857 3874980 3850110 3860866 3860771 3846880 3888885
##   [82] 3850410 3850110 3860771 3830816 3781788 3850110 3846853 3888885 3888885
##   [91] 3880570 3888885 3850110 3888885 3846853 3888885 3888885 3888885 3880570
##  [100] 3880570 3773695 3868533 3888885 3888885 3888885 3773695 3880570 3888885
##  [109] 3868533 3888885 3880570 3888885 3888885 3888885 3888885 3888885 3880570
##  [118] 3888885 3888885 3888885 3880570 3888885 3888885 3880570 3888885 3880570
##  [127] 3880570 3888885 3787167 3888885 3888885 3888885 3885153 3887221 3774835
##  [136] 3888885 3888885 3913427 3888885 3681724 3913427 3885336 3811369 3861307
##  [145] 3914857 3931538 3888885 3931538 3914857 3925351 3922776 3888885 3888885
##  [154] 3914857 3922776 3925681 3924446 3914857 3797018 3931538 3945047 3924446
##  [163] 3924373 3928965 3737578 3942707 3931538 3912924 3924446 3894268 3930843
##  [172] 3897566 3905286 3925681 3941299 3925904 3904653 3941299 3945360 3941299
##  [181] 3941299 3945360 3951259 3931326 3838813 3922132 3951259 3941299 3945360
##  [190] 3791776 3918172 3940407 3896724 3961291 3941299 3951259 3948201 3931326
##  [199] 3968021 3910064 3968389 3919026 3951259 3968021 3955200 3910064 3903205
##  [208] 3921774 3955200 3968021 3736466 3945204 3934930 3924836 3963499 3779840
##  [217] 3968021 3939324 3910040 3965510 3910040 3965510 3895452 3945204 3939998
##  [226] 3951496 3965510 3952249 3919608 3956152 3756612 3945204 3951496 3951496
##  [235] 3846247 3951496 3951058 3951496 3951496 3893402 3951496 3951496 3951496
##  [244] 3961291 3791776 3918172 3918031 3951259 3938632 3931326 3919026 3951259
##  [253] 3906009 3955200 3946713 3951259 3955200 3951231 3945204 3934930 3736466
##  [262] 3934930 3779840 3910949 3951231 3945204 3952249 3939998 3952249 3965510
##  [271] 3895452 3951496 3965044 3951496 3822073 3910040 3951496 3919608 3951496
##  [280] 3951496 3951058 3951496 3951496 3951058 3951496 3977318 3951496 3951496
##  [289] 3951496 3951058 3951496 3753991 3766396 3770638 3784528 3792849 3800395
##  [298] 3806607 3809464 3821759 3837667 3843230 3844358 3846046 3867730 3891015
##  [307] 3896035 3899605 3911316 3944105 3946279 3951223 3954258 3970222 3970349
##  [316] 3977535 3985981 3994488 3995019 4008863 4012643 4016595 4019070 4020363
##  [325] 4025478 4029583 4030790 4039659 4053802 4056587 4063530 3688079 3693997
##  [334] 3701816 3704442 3728133 3737439 3737439 3757711 3764329 3771793 3790186
##  [343] 3790589 3809208 3809208 3809208 3833227 3838830 3839680 3839779 3839779
##  [352] 3850804 3850846 3850846 3850846 3859672 3859672 3861468 3897013 3897013
##  [361] 3897013 3897133 3897133 3897133 3897133 3897358 3912953 3913527 3914309
##  [370] 3916824 3917346 3930956 3930956 3930956 3934803 3934803 3937354 3937354
##  [379] 3937354 3942461 3942461 3942461 3942461 3950804 3950804 3950804 3950804
##  [388] 3950804 3951770 3956863 3958942 3958942 3962896 3962896 3962896 3962896
##  [397] 3962920 3962920 3962920 3963306 3963306 3963306 3963818 3963818 3968799
##  [406] 3968799 3968799 3970981 3972276 3973066 3973272 3976563 3976563 3978550
##  [415] 3979455 3979966 3982472 3982472 3982472 3982472 3982472 3982472 3982472
##  [424] 3982472 3985357 3985357 3985468 3988679 3988679 3988679 3988679 3991461
##  [433] 3997914 3660088 3672960 3707323 3692795 3687324 3692633 3699750 3736621
##  [442] 3700793 3736621 3731673 3759132 3704083 3731348 3741108 3704285 3749140
##  [451] 3768599 3758067 3730611 3771255 3764633 3768259 3796330 3750550 3778960
##  [460] 3786789 3792695 3770624 3777838 3778960 3779319 3779717 3770624 3792695
##  [469] 3794361 3789993 3786496 3779717 3786789 3785188 3793103 3788655 3789993
##  [478] 3792598 3793415 3793745 3793322 3800231 3799684 3800658 3800231 3796836
##  [487] 3799322 3799058 3799684 3800010 3802154 3795824 3801718 3794329 3802208
##  [496] 3800231 3802154 3811367 3802208 3799058 3800010 3811367 3811034 3799322
##  [505] 3802541 3818763 3800018 3810824 3805428 3804106 3815264 3807201 3815912
##  [514] 3800207 3815181 3810830 3812041 3820802 3817862 3816727 3822250 3825411
##  [523] 3813056 3837516 3824499 3815318 3844697 3839953 3840476 3841490 3841667
##  [532] 3849334 3842306 3846674 3847859 3847859 3847454 3851387 3846964 3851626
##  [541] 3849309 3843500 3855917 3855461 3851843 3849997 3846964 3850994 3852801
##  [550] 3859680 3856521 3860679 3862595 3857515 3867748 3861462 3861450 3860803
##  [559] 3861910 3866461 3867827 3864396 3867573 3860395 3863005 3865653 3874096
##  [568] 3866948 3868706 3867064 3869203 3866374 3873548 4149818 4149818 4149818
##  [577] 4149818 4149818 4149818 4149818 4148351 4148351 4148351 4148351 4148351
##  [586] 4148351 4148351 4148211 4148211 4148211 4148211 4148211 4148211 4148211
##  [595] 4149019 4149019 4149019 4149019 4149019 4149019 4149019 4149680 4149680
##  [604] 4149680 4149680 4149680 4149680 4149680 4151062 4151062 4151062 4151062
##  [613] 4151062 4151062 4151062 4149068 4149068 4149068 4149068 4149068 4149068
##  [622] 4149068 4147408 4147408 4147408 4147408 4147408 4147408 4147408 4145766
##  [631] 4145766 4145766 4145766 4145766 4145766 4145766 4142373 4142373 4142373
##  [640] 4142373 4142373 4142373 4142373 4139042 4139042 4139042 4139042 4139042
##  [649] 4139042 4139042 4134190 4137755 4129964 4129964 4129964 4129964 4129964
##  [658] 4129964 4129964 4130218 4130218 4130218 4130218 4130218 4130218 4130218
##  [667] 4131001 4131001 4131001 4131001 4131001 4131001 4131001 4130720 4130720
##  [676] 4130720 4130720 4130720 4130720 4130720 4132472 4132472 4132472 4132472
##  [685] 4132472 4132472 4132472 4132119 4132119 4132119 4132119 4132119 4132119
##  [694] 4132119 4132901 4132901 4132901 4132901 4132901 4132901 4132901 4136427
##  [703] 4130549 4133027 4133027 4133027 4133027 4133027 4133027 4133027 4131965
##  [712] 4131965 4131965 4131965 4131965 4131965 4131965 4131189 4131189 4131189
##  [721] 4131189 4131189 4131189 4131189 4154449 4131073 4131073 4131073 4131073
##  [730] 4131073 4131073 4131073 4131273 4131273 4131273 4131273 4131273 4131273
##  [739] 4131273 4131066 4131066 4131066 4131066 4131066 4131066 4131066 4144042
##  [748] 4130402 4130402 4130402 4130402 4130402 4130402 4130402 4157805 4130817
##  [757] 4130817 4130817 4130817 4130817 4130817 4130817 4157050 4131743 4131743
##  [766] 4131743 4131743 4131743 4131743 4131743 4152219 4154671 4131842 4131842
##  [775] 4131842 4131842 4131842 4131842 4131842 4153568 4153080 4131637 4131637
##  [784] 4131637 4131637 4131637 4131637 4131637 4153267 4131370 4131370 4131370
##  [793] 4131370 4131370 4131370 4131370 4152646 4151879 4148924 4145847 4136053
##  [802] 4130433 4127497 4128064 4130158 4131285 4131462 4131462 4131462 4131462
##  [811] 4131462 4131462 4131462 4141557 4127062 4127606 4128808 4131819 4131753
##  [820] 4131472 4131462 4131462 4131462 4131462 4131462 4131462 4135984 4135677
##  [829] 4135185 4134897 4136355 4139893 4128559 4161912 4146069 4159540 4161047
##  [838] 4164047 4158404 4159809 4136102 4149894 4140077 4152002 4145492 4152410
##  [847] 4152300 4151493 4162818 4155318 4163328 4163458 4160904 4152539 4155541
##  [856] 4154795 3688130 3693840 3693997 3697559 3697559 3727971 3728133 3757924
##  [865] 3757924 3764329 3764967 3771793 3784982 3785013 3790589 3792276 3809208
##  [874] 3809208 3822509 3822509 3822509 3822509 3823060 3823060 3833227 3839680
##  [883] 3844394 3850846 3851000 3858984 3859672 3861499 3861499 3897013 3897013
##  [892] 3897013 3897013 3897133 3897133 3913527 3914309 3916824 3917346 3919919
##  [901] 3919919 3919919 3930799 3934099 3934099 3934099 3934099 3934803 3934803
##  [910] 3934803 3937354 3942461 3950804 3950804 3950804 3950804 3950804 3950804
##  [919] 3955440 3955440 3955440 3955440 3956863 3956863 3958899 3958899 3958942
##  [928] 3959812 3960081 3962896 3962896 3962920 3963306 3963335 3964413 3968799
##  [937] 3968799 3968799 3968799 3972276 3973055 3976563 3976563 3976563 3978550
##  [946] 3979305 3979966 3981155 3981155 3982458 3982472 3982472 3982472 3982472
##  [955] 3985357 3988679 3988679 3988679 3988679 3989493 3989493 3989609 3991461
##  [964] 3997640 3997914 3997914 3829226 3753991 3758148 3761644 3761873 3766600
##  [973] 3768476 3777661 3783980 3784406 3787042 3796001 3800395 3803001 3822212
##  [982] 3822212 3823571 3823571 3831665 3831665 3837760 3840748 3844358 3846046
##  [991] 3846046 3849453 3860968 3862922 3867693 3867730 3871156 3872896 3881477
## [1000] 3881477 3881477 3881545 3881545 3881545 3882290 3890848 3891006 3895728
## [1009] 3896283 3896283 3896283 3896283 3896283 3899605 3899605 3899605 3899605
## [1018] 3901291 3902016 3902016 3903362 3914740 3914740 3915792 3922349 3925898
## [1027] 3932452 3932452 3932452 3932452 3932452 3932452 3937746 3938839 3938839
## [1036] 3942365 3942365 3942365 3942365 3943425 3944105 3944105 3945509 3945509
## [1045] 3945509 3945509 3945509 3945509 3945509 3945509 3950482 3950482 3950482
## [1054] 3952456 3956492 3968064 3968064 3968064 3968064 3969493 3969493 3970349
## [1063] 3970349 3977810 3977810 3979273 3979273 3979273 3980583 3980583 3980583
## [1072] 3980583 3980583 3980583 3980583 3980583 3980583 3980583 3980583 3980583
## [1081] 3980912 3980912 3980912 3980912 3980912 3980912 3980912 3983770 3985981
## [1090] 3985981 3985981 3987580 3987580 3987580 3987580 3987580 3991265 3992195
## [1099] 3992195 3992514 3992514 3992514 3992514 3992514 3992514 3992514 3992514
## [1108] 3992514 3993217 3993217 3993217 3994488 4001446 4001833 4009002 4011808
## [1117] 4011808 4011808 4016595 4016595 4019070 4019070 4019070 4019070 4019782
## [1126] 4019782 4019782 4019990 4020363 4022913 4024738 4024738 4025356 4025356
## [1135] 4028436 4028436 4028801 4028801 4029583 4029583 4029583 4029583 4029583
## [1144] 4029583 4029583 4030790 4030790 4030790 4030790 4030790 4030790 4030790
## [1153] 4030790 4030790 4030790 4034016 4034016 4035827 4035827 4035827 4036269
## [1162] 4039659 4039659 4039659 4039659 4039659 4039659 4039659 4039659 4039659
## [1171] 4039659 4039659 4041292 4041292 4041292 4041292 4041292 4042118 4042118
## [1180] 4042118 4042118 4042118 4042118 4042118 4045959 4045959 4047863 4047863
## [1189] 4048093 4048093 4048093 4048093 4048093 4048093 4048093 4048093 4048093
## [1198] 4048093 4048093 4048309 4048309 4048309 4048309 4048309 4048309 4048309
## [1207] 4048309 4048309 4048309 4048309 4048664 4048664 4048664 4048664 4048664
## [1216] 4053308 4053308 4053855 4053855 4053855 4053855 4053855 4053855 4053855
## [1225] 4055274 4055274 4055274 4055274 4056126 4056587 4056587 4056587 4060840
## [1234] 4060840 4060840 4060840 4060840 4060840 4060840 4060840 4060840 4063530
## [1243] 4063530 4063530 4063530 4063530 4063530 4063530 4063530 4063530 4063530
## [1252] 4063530 4063530 4063530 4063530 4063530 4065111 4065111 4065111 4065111
## [1261] 4065634 4065634 4065634 4065634 4065634 4065634 4065634 3762003 3762003
## [1270] 3762822 3762822 3771269 3775206 3775623 3775623 3786144 3786144 3791926
## [1279] 3796022 3796154 3796154 3796154 3800278 3800866 3812984 3812984 3812984
## [1288] 3812984 3813219 3821659 3828475 3828710 3830986 3830986 3831990 3858220
## [1297] 3858220 3858220 3858220 3858220 3858220 3858220 3858220 3858220 3858220
## [1306] 3858220 3858220 3865058 3866843 3866843 3868115 3868115 3868115 3868115
## [1315] 3870045 3870045 3870191 3881561 3881883 3881883 3884149 3887619 3890349
## [1324] 3891404 3891404 3897016 3900247 3900247 3900247 3900247 3900247 3900247
## [1333] 3909370 3909370 3909851 3911732 3915555 3915555 3915555 3917393 3917393
## [1342] 3917397 3921994 3925970 3928593 3928938 3928938 3928938 3928938 3928938
## [1351] 3928938 3951632 3952590 3952590 3952590 3953579 3953579 3953579 3953579
## [1360] 3953940 3960981 3960981 3962854 3963301 3963301 3963301 3963301 3963301
## [1369] 3963815 3976549 3977168 3977168 3977168 3977660 3977660 3979809 3979921
## [1378] 3979921 3979921 3990236 3990236 3990236 3990236 3990236 3990236 3990236
## [1387] 3990236 3990236 3990236 3990236 3990236 3990236 4000547 4000547 4000547
## [1396] 4000547 4000547 4000547 4000547 3781627 3786775 3786775 3787062 3800187
## [1405] 3805853 3818228 3825045 3825045 3825045 3827918 3827918 3830347 3834830
## [1414] 3836948 3838897 3838897 3843213 3843213 3843410 3843717 3843721 3856051
## [1423] 3856051 3864845 3864845 3865859 3865859 3865859 3867643 3867643 3875343
## [1432] 3875414 3875414 3875414 3879614 3879614 3879614 3882652 3884756 3900806
## [1441] 3901408 3902844 3902844 3910664 3910664 3910664 3919462 3919462 3919462
## [1450] 3919462 3919462 3919462 3919462 3919462 3925764 3926038 3926038 3926038
## [1459] 3926038 3926038 3926038 3926038 3926038 3928189 3928189 3928189 3928189
## [1468] 3928243 3928243 3928243 3928293 3935659 3936863 3938004 3941463 3941463
## [1477] 3942585 3942585 3942714 3942790 3942790 3942790 3943654 3943654 3943654
## [1486] 3944948 3944948 3945706 3950202 3950202 3950202 3957830 3957830 3957830
## [1495] 3957830 3957830 3957830 3957830 3958012 3958012 3958012 3958012 3958012
## [1504] 3967920 3967920 3968511 3970911 3970911 3970911 3972379 3972567 3972567
## [1513] 3972567 3972567 3972567 3978396 3978396 3978396 3978396 3978396 3978396
## [1522] 3980708 3982799 3982799 3982799 3770588 3770588 3772629 3781641 3791091
## [1531] 3791091 3791856 3791856 3793929 3793950 3794957 3808510 3808510 3814059
## [1540] 3815110 3820772 3820772 3822372 3822372 3823320 3824084 3839331 3839331
## [1549] 3841489 3841489 3854008 3861693 3862018 3864767 3864767 3864767 3865673
## [1558] 3865673 3869102 3869102 3869704 3875136 3875239 3875239 3875239 3875239
## [1567] 3875714 3879643 3880240 3880240 3881632 3881661 3881661 3881661 3881661
## [1576] 3881661 3881661 3881661 3883844 3883844 3884195 3884195 3886424 3886424
## [1585] 3891067 3892635 3892635 3892635 3892635 3892635 3892758 3897228 3897416
## [1594] 3897416 3897416 3897416 3897967 3901847 3917786 3918851 3918851 3918851
## [1603] 3918851 3926885 3926885 3928087 3928087 3928087 3928087 3928087 3928087
## [1612] 3928159 3929820 3939568 3943178 3943178 3943178 3943577 3943577 3944303
## [1621] 3944303 3948462 3948462 3948757 3952093 3952093 3952093 3952601 3952601
## [1630] 3954471 3954471 3958737 3965745 3965905 3965905 3966238 3966238 3966238
## [1639] 3969757 3969757 3969757 3969971 3969971 3971257 3971257 3971257 3762188
## [1648] 3762188 3768251 3768285 3768285 3768285 3768285 3768285 3768285 3768285
## [1657] 3768285 3768285 3768285 3768285 3768285 3768285 3768285 3768285 3768285
## [1666] 3768285 3768285 3768285 3768285 3768285 3768285 3768285 3768285 3768285
## [1675] 3768285 3768285 3768285 3768285 3769009 3769009 3769009 3769009 3769009
## [1684] 3769009 3769009 3769009 3769009 3769009 3769009 3769009 3769009 3769009
## [1693] 3769009 3769009 3769732 3769732 3769732 3782044 3783116 3785728 3785728
## [1702] 3796184 3800993 3809849 3811327 3811327 3814038 3814038 3818579 3818579
## [1711] 3823021 3823884 3823966 3829816 3829816 3830598 3835373 3835373 3835373
## [1720] 3860428 3860428 3860765 3860765 3860765 3860765 3862739 3862781 3862781
## [1729] 3862781 3862781 3862781 3862781 3862916 3767772 3767772 3774766 3774766
## [1738] 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766
## [1747] 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766
## [1756] 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766
## [1765] 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766
## [1774] 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766 3774766
## [1783] 3774766 3774766 3774766 3774766 3774766 3774766 3775228 3775228 3775228
## [1792] 3775228 3792107 3792107 3792107 3815798 3818445 3827596 3828179 3828179
## [1801] 3828639 3828639 3828639 3828880 3832682 3832682 3834454 3839513 3839513
## [1810] 3855712 3858139 3858139 3860388 3860388 3860388 3860388 3869910 3873897
## [1819] 3890925 3890945 3891417 3891417 3891417 3891417 3891417 3900611 3900897
## [1828] 3900897 3900897 3900897 3900903 3900903 3900903 3900903 3900903 3900903
## [1837] 3900903 3900903 3901581 3905162 3905634 3905634 3907125 3907125 3907125
## [1846] 3910761 3912052 3912688 3795269 3795858 3795858 3802820 3819621 3828981
## [1855] 3828981 3843852 3849214 3850203 3853408 3853408 3853786 3859574 3876496
## [1864] 3876901 3876901 3881072 3883319 3883319 3884238 3884238 3885887 3898473
## [1873] 3908887 3909480 3909480 3909480 3909480 3909480 3909480 3909480 3912813
## [1882] 3922218 3922218 3922515 3923753 3925855 3927301 3927301 3927301 3927301
## [1891] 3933854 3933854 3935637 3937805 3938483 3938483 3938830 3958208 3958208
## [1900] 3958208 3959154 3959154 3959154 3959154 3959154 3959154 3959154 3962733
## [1909] 3963923 3963923 3970036 3971107 3971107 3971107 3971107 3971107 3971107
## [1918] 3971107 3971107 3971107 3971107 3971298 3983698 3984504 3984504 3984504
## [1927] 3987927 3988061 3993010 3993010 3995066 3995066 3995066 3995066 3996934
## [1936] 3996934 3999984 4000749 4000749 4000749 4001969 4001969 4002111 4002703
## [1945] 4002703 4002703 4003334 4003334 4003334 4005440 4005440 4005440 4005440
## [1954] 4005440 4005440 4005440 4005563 4005563 4005563 4005563 4005563 4005563
## [1963] 4005563 4009510 3757082 3757082 3757082 3762886 3762886 3762886 3762886
## [1972] 3762886 3762886 3762886 3762886 3762886 3762886 3762886 3762886 3762886
## [1981] 3762886 3762886 3762886 3762886 3762886 3762886 3762886 3762886 3762886
## [1990] 3765038 3765658 3767999 3770552 3787466 3815321 3815321 3825872 3825872
## [1999] 3825872 3825872 3825872 3825872 3825872 3825872 3827004 3830489 3830489
## [2008] 3830489 3830489 3830489 3830489 3830489 3830489 3830489 3830489 3831358
## [2017] 3831358 3869915 3875821 3875821 3875821 3875821 3875821 3875821 3875821
## [2026] 3875821 3875821 3875821 3875821 3884619 3884619 3884619 3890555 3891726
## [2035] 3891726 3894245 3906236 3914368 3914368 3914368 3914368 3914755 3915508
## [2044] 3920413 3931696 3933917 3936428 3936428 3936428 3941318 3941509 3941509
## [2053] 3941509 3941509 3941509 3942072 3942072 3942235 3942235 3942235 3942235
## [2062] 3944711 3944968 3952971 3952971 3952971 3952971 3952971 3952971 3952971
## [2071] 3952971 3957207 3957207 3960290 3960290 3960290 3961976 3962792 3962842
## [2080] 3962842 3962842 3962842 3962842 3962842 3963005 3969620 3969750 3969750
## [2089] 3759950 3759950 3767731 3776287 3776287 3776287 3776287 3776287 3776287
## [2098] 3776287 3776287 3776287 3776287 3776287 3776287 3776287 3776287 3776287
## [2107] 3776287 3776287 3787220 3787220 3787220 3787220 3787220 3787220 3787220
## [2116] 3787220 3787220 3787220 3787220 3792111 3797292 3797292 3798046 3812633
## [2125] 3812633 3827814 3835591 3835591 3835591 3835591 3846261 3846261 3846261
## [2134] 3846261 3849884 3849884 3860935 3860935 3860935 3860935 3860935 3860935
## [2143] 3860935 3860935 3861003 3863247 3865391 3865391 3868137 3876179 3877669
## [2152] 3877669 3877669 3877669 3877669 3877669 3877669 3877669 3877669 3877669
## [2161] 3881457 3881457 3884411 3884411 3884879 3889285 3889285 3889285 3890896
## [2170] 3892567 3895927 3896117 3896117 3896117 3896117 3896712 3896712 3897983
## [2179] 3909091 3909091 3909091 3909091 3909091 3916117 3918845 3923225 3923225
## [2188] 3923225 3923225 3942049 3944239 3952290 3952290 3952290 3955569 3955569
## [2197] 3957832 3958929 3958929 3963376 3964867 3964867 3964867 3964867 3964867
## [2206] 3964867 3964906 3969743 3970875 3971133 3975690 3975690 3980441 3981272
## [2215] 3981272 3982707 3983719 3983719 3983719 3983719 3987588 3778234 3778836
## [2224] 3780216 3780216 3780216 3780216 3780216 3780216 3780216 3780216 3780216
## [2233] 3780216 3780216 3780216 3780216 3780216 3780216 3780216 3780216 3789832
## [2242] 3789832 3789832 3793444 3793444 3794601 3795966 3808911 3808911 3810294
## [2251] 3814927 3814927 3814927 3814927 3814927 3824485 3827922 3828055 3828055
## [2260] 3831024 3831056 3833847 3845796 3847689 3847880 3851252 3851402 3856971
## [2269] 3856971 3857561 3857561 3857561 3857561 3857561 3858075 3858469 3866420
## [2278] 3871513 3878494 3878494 3885553 3886716 3886716 3886716 3887651 3887651
## [2287] 3887651 3887651 3891161 3891161 3891161 3891161 3891161 3893769 3893769
## [2296] 3894017 3894703 3906197 3906197 3908495 3910240 3910426 3910426 3910426
## [2305] 3911561 3911561 3913374 3920623 3921201 3921201 3921201 3921201 3921201
## [2314] 3922896 3922896 3922896 3922896 3931529 3931529 3932542 3943021 3946251
## [2323] 3946251 3946251 3946251 3946251 3946251 3948079 3948079 3951722 3951722
## [2332] 3951722 3951722 3951722 3951722 3951722 3951807 3951989 3971376 3971376
## [2341] 3971376 3971376 3973883 3973883 3973883 3973883 3973883 3975540 3975540
## [2350] 3977019 3977019 3977019 3977019 3977019 3977019 3985524 3985788 3986821
## [2359] 3986834 3986834 3986834 3988941 3988941 3988941 3988941 3991095 3795218
## [2368] 3797839 3799187 3799187 3799187 3799187 3799187 3799187 3799187 3799187
## [2377] 3799187 3799187 3799187 3799187 3799187 3799187 3799187 3799187 3799187
## [2386] 3799187 3799187 3799187 3799187 3799187 3799187 3799187 3799187 3799187
## [2395] 3799187 3799187 3799187 3799187 3799187 3799187 3799294 3806779 3806779
## [2404] 3821339 3821563 3825794 3825794 3826493 3833598 3840845 3840845 3846420
## [2413] 3846420 3846466 3846466 3855394 3860071 3860071 3860403 3864017 3864017
## [2422] 3864017 3869028 3869028 3871264 3871785 3875696 3875973 3875973 3875973
## [2431] 3876264 3879497 3879541 3883188 3883389 3886484 3888451 3889130 3889130
## [2440] 3889130 3889130 3897176 3899191 3899235 3910535 3910535 3911271 3913011
## [2449] 3913660 3917327 3917327 3928308 3932206 3932206 3932206 3932206 3932206
## [2458] 3932206 3932206 3932206 3932206 3935445 3935445 3935445 3935445 3937036
## [2467] 3937036 3937036 3937036 3937036 3938139 3938626 3940504 3940504 3940504
## [2476] 3940504 3940504 3940504 3940504 3943558 3943558 3777680 3795166 3795166
## [2485] 3799544 3800115 3800848 3810898 3833438 3833438 3851147 3851147 3851147
## [2494] 3851971 3851971 3855999 3856232 3856232 3856232 3856232 3856232 3871063
## [2503] 3874714 3874714 3874714 3875015 3875015 3877080 3879878 3882900 3902162
## [2512] 3902162 3903849 3903916 3903916 3903916 3918046 3918046 3918046 3918046
## [2521] 3918046 3918143 3920718 3924871 3929025 3929463 3929504 3929504 3929504
## [2530] 3929504 3929504 3929504 3929504 3929504 3932829 3932829 3933459 3933738
## [2539] 3942851 3943953 3943953 3943953 3943953 3943953 3943953 3943953 3943953
## [2548] 3943953 3943953 3943953 3943953 3943953 3943953 3943953 3943953 3943953
## [2557] 3943953 3948663 3948663 3951673 3951673 3951673 3951673 3951673 3951673
## [2566] 3951673 3951673 3951673 3955383 3955383 3955383 3955383 3955383 3955383
## [2575] 3955383 3955383 3955383 3955383 3955383 3955383 3955383 3955383 3955383
## [2584] 3960273 3961416 3961519 3964429 3972861 3972861 3972861 3972861 3975258
## [2593] 3975258 3975258 3975258 3977159 3977299 3977299 3982792 3983230 3765268
## [2602] 3766334 3767924 3771946 3771946 3777220 3777329 3777401 3777401 3777401
## [2611] 3777401 3786536 3795701 3795701 3795701 3795701 3795701 3795701 3795701
## [2620] 3795701 3795701 3795701 3795701 3795701 3795701 3795701 3795701 3795701
## [2629] 3795701 3795701 3795701 3795701 3795701 3795701 3795701 3795701 3795701
## [2638] 3795701 3795701 3795701 3824773 3824773 3824773 3824773 3824777 3831095
## [2647] 3836562 3838371 3844347 3848788 3849238 3849238 3849238 3866614 3870227
## [2656] 3885892 3885892 3897142 3897142 3904555 3904555 3904555 3904750 3912010
## [2665] 3912357 3912357 3912357 3912357 3912357 3912357 3920465 3920465 3920465
## [2674] 3920465 3920465 3920465 3926273 3929755 3929755 3934961 3936329 3936329
## [2683] 3936329 3936329 3936329 3936329 3936329 3936329 3948463 3948463 3948463
## [2692] 3948463 3948463 3948463 3948463 3948463 3953130 3953130 3953130 3953280
## [2701] 3954592 3954592 3954592 3954592 3954592 3954592 3954592 3954592 3956657
## [2710] 3956657 3970186 3972021 3972021 3980133 3980133 3980133 3980133 3980133
## [2719] 3980133 3980133 3980133 3980133 3980133 3980133 3980133 3980133 3980133
## [2728] 3980133 3980133 3980133 3980133 3980133 3981465 3981465 3981465 3981465
## [2737] 3981465 3981465 3985948 3985948 3988268 3988268 3988268 3988268 3988268
## [2746] 3988268 3988268 3988803 3993722 3993722 3993722 3993722 3993722 3993722
## [2755] 3993722 3993722 3993722 3994334 3994334 3994334 3994334 3994334 3994334
## [2764] 3994334 3772087 3772087 3776909 3776909 3789375 3791435 3814857 3819331
## [2773] 3826688 3827018 3830331 3833171 3843815 3845859 3849862 3849862 3849862
## [2782] 3849862 3856218 3856878 3856878 3864686 3869912 3874690 3874690 3874690
## [2791] 3874690 3874690 3876084 3878292 3878643 3883853 3884511 3885368 3886085
## [2800] 3886085 3886085 3886085 3886725 3892972 3893516 3893516 3893516 3895521
## [2809] 3895521 3895521 3895521 3900600 3900607 3904071 3906414 3908978 3908978
## [2818] 3908978 3909104 3909320 3916475 3920287 3928839 3930838 3931719 3940951
## [2827] 3942173 3942173 3942441 3946859 3946859 3946859 3946859 3946859 3946957
## [2836] 3946957 3955953 3955953 3955953 3955953 3955953 3960253 3960253 3962995
## [2845] 3962995 3962995 3968403 3968403 3968403 3968403 3968403 3968403 3968403
## [2854] 3968403 3968403 3968403 3969838 3969838 3969838 3969838 3969838 3970221
## [2863] 3970221 3970221 3973092 3973092 3973540 3974919 3975022 3975022 3977958
## [2872] 3977958 3977958 3977958 3980907 3981218 3981573 3981573 3981573 3983652
## [2881] 3983652 3984154 3984154 3984154 3984154 3984154 3984154 3984154 3984154
## [2890] 3984154 3984154 3984154 3984154 3984154 3984154 3984154 3984154 3984154
## [2899] 3984160 3984953 3994493 3994493 3994969 3994969 3994969 3994969 3994969
## [2908] 3994969 3994969 3998969 3998969 3761277 3782775 3782775 3796942 3797402
## [2917] 3812754 3812754 3812754 3816391 3817201 3817715 3824917 3846539 3840083
## [2926] 3840515 3846539 3846539 3856388 3856573 3856573 3857594 3860362 3860362
## [2935] 3860362 3860362 3860362 3860362 3860362 3862717 3862717 3862717 3862717
## [2944] 3862717 3862717 3862717 3862717 3862717 3862717 3862717 3862717 3862717
## [2953] 3862717 3862717 3862717 3862717 3862717 3863229 3874071 3874071 3877279
## [2962] 3877279 3877279 3877279 3880547 3880547 3880967 3902408 3912747 3915239
## [2971] 3916297 3916297 3916297 3918248 3918248 3918248 3918355 3918355 3918355
## [2980] 3918355 3918355 3918355 3918355 3918550 3921071 3921071 3921071 3921260
## [2989] 3921260 3926217 3926217 3928959 3930249 3931965 3941416 3941416 3953015
## [2998] 3953015 3953015 3953015 3953015 3955295 3958520 3958520 3969841 3969841
## [3007] 3969841 3979631 3984213 3985054 3987840 3988305 3989935 3992473 3992936
## [3016] 3992936 3992936 3992936 3994141 3994141 3996269 3996269 3996269 3996269
## [3025] 4002559 4002559 4002559 4002559 3761304 3761304 3764679 3764679 3764679
## [3034] 3765952 3778952 3790055 3790055 3790055 3794519 3808944 3821102 3821102
## [3043] 3829578 3829578 3829745 3833176 3833176 3837935 3837935 3837935 3837935
## [3052] 3837935 3837935 3837935 3845916 3845916 3846209 3846209 3848585 3848585
## [3061] 3848585 3848585 3848585 3848585 3848585 3848585 3848585 3849191 3849191
## [3070] 3865539 3865539 3865539 3865539 3865539 3865539 3865539 3866010 3878305
## [3079] 3878305 3886567 3886745 3886745 3886745 3886745 3886745 3886745 3886745
## [3088] 3886745 3886745 3886745 3886821 3886821 3894900 3894906 3894906 3898220
## [3097] 3898220 3898476 3906879 3906879 3912396 3912396 3912396 3912396 3912396
## [3106] 3912396 3912396 3912396 3912396 3912396 3929265 3931068 3931068 3940288
## [3115] 3942143 3942657 3944682 3952891 3952660 3952660 3954231 3954519 3954519
## [3124] 3954519 3955389 3955389 3955389 3955531 3957637 3962672 3963335 3966303
## [3133] 3977743 3977743 3980487 3981072 3981072 3982068 3989961 3990863 3990863
## [3142] 3990863 3990863 3991168 3991168 3991168 3991168 3991168 3991168 3991168
## [3151] 3991168 3991168 3991168 3991168 3991168 3991168 3991168 3991168 3996611
## [3160] 3996951 3996951 3999519 3999844 3999844 4001412 3760802 3779022 3779022
## [3169] 3779022 3779022 3779022 3779022 3779022 3779022 3779022 3779022 3779022
## [3178] 3779022 3779022 3779022 3779022 3779022 3779022 3779022 3779022 3779022
## [3187] 3786675 3786675 3791264 3795529 3795529 3797240 3814022 3814022 3827065
## [3196] 3828415 3828415 3828415 3828415 3828415 3828448 3829163 3829163 3829163
## [3205] 3829163 3829163 3829661 3833246 3846116 3846852 3846852 3854959 3852790
## [3214] 3852790 3858685 3858685 3869027 3871495 3872604 3872604 3885924 3896062
## [3223] 3897560 3897560 3898157 3904745 3904745 3905886 3920757 3920757 3920797
## [3232] 3932077 3935169 3938762 3938762 3942204 3942204 3945575 3946120 3946120
## [3241] 3946120 3946868 3946868 3946868 3946868 3778074 3792095 3792140 3792140
## [3250] 3792140 3799702 3800491 3803461 3803461 3803461 3806014 3812428 3812428
## [3259] 3813363 3818114 3818747 3818747 3818747 3825034 3830126 3849846 3849846
## [3268] 3851162 3851162 3851162 3851162 3852552 3859978 3859978 3866492 3866492
## [3277] 3868292 3868292 3870503 3870503 3871926 3872063 3879283 3879283 3880642
## [3286] 3880642 3880642 3882042 3882042 3883679 3883679 3883679 3888610 3888748
## [3295] 3888748 3888748 3888748 3896501 3896633 3897698 3897698 3897698 3906280
## [3304] 3906280 3906280 3909365 3909365 3909365 3909365 3909365 3909365 3909365
## [3313] 3909365 3909365 3909365 3909365 3909365 3909980 3909980 3909980 3909980
## [3322] 3911797 3911797 3911797 3911797 3911797 3911797 3915270 3915270 3915270
## [3331] 3915270 3915270 3915270 3922635 3922635 3922635 3931365 3931365 3931365
## [3340] 3935548 3935548 3947723 3947723 3947723 3951277 3954783 3954783 3959391
## [3349] 3960320 3960517 3962542 3962542 3962542 3962542 3962542 3962542 3962542
## [3358] 3962542 3962542 3781685 3800063 3800063 3800063 3800063 3800063 3800063
## [3367] 3800063 3800063 3800063 3800063 3800063 3800063 3800063 3800063 3804451
## [3376] 3804451 3807920 3807981 3807981 3808102 3808102 3808102 3808102 3808102
## [3385] 3808102 3808102 3808102 3808102 3808102 3813148 3813430 3813430 3813430
## [3394] 3816382 3816392 3816392 3816393 3816393 3816393 3816393 3817587 3817587
## [3403] 3826287 3826287 3826287 3826287 3826287 3831858 3840035 3840035 3846381
## [3412] 3846381 3846443 3847024 3847024 3849393 3849393 3849393 3849393 3851280
## [3421] 3851280 3862590 3863746 3868741 3868741 3868741 3868741 3870278 3870278
## [3430] 3870278 3870278 3870278 3871509 3871509 3875756 3875756 3876773 3876773
## [3439] 3881046 3881046 3890229 3915302 3915302 3915302 3918941 3918941 3921380
## [3448] 3921380 3921380 3921380 3921380 3921380 3927036 3927419 3927419 3927419
## [3457] 3927419 3933970 3943065 3944570 3944570 3944570 3944570 3944570 3944570
## [3466] 3944570 3944570 3944570 3944570 3944570 3756365 3756365 3770216 3770216
## [3475] 3770216 3770216 3770216 3770216 3770216 3770216 3770216 3770216 3770216
## [3484] 3770216 3770216 3770216 3770216 3770216 3770216 3770216 3770216 3770216
## [3493] 3770216 3770216 3770216 3770216 3770216 3770216 3770216 3770216 3770216
## [3502] 3770216 3770216 3779768 3783805 3783805 3783952 3789431 3802664 3807857
## [3511] 3834923 3835027 3835027 3836719 3837550 3837626 3837626 3837626 3848727
## [3520] 3853803 3854100 3854100 3859503 3859503 3861888 3861963 3861963 3861963
## [3529] 3862937 3862938 3862938 3862938 3867318 3867318 3867318 3867318 3867318
## [3538] 3867318 3867318 3869969 3870196 3870196 3870196 3870196 3870196 3895118
## [3547] 3895118 3895199 3897017 3898496 3901546 3907355 3907355 3908441 3908471
## [3556] 3910065 3910328 3912858 3920517 3920517 3920517 3920517 3920517 3920517
## [3565] 3920517 3920517 3920517 3920517 3921909 3921909 3921909 3921909 3921909
## [3574] 3921909 3921909 3921909 3921909 3921909 3921909 3921922 3931243 3931454
## [3583] 3931454 3931454 3938808 3939249 3939249 3962542 4009510 3940127 3993269
## [3592] 3742870 3648000 3678158 3707607 3592977 3706400 3681429 3563712 3588022
## [3601] 3742870 3706400 3658767 3700440 3742870 3643517 3659731 3742870 3654721
## [3610] 3742870 3659348 3700287 3725706 3760400 3700350 3742870 3660790 3715896
## [3619] 3742074 3760400 3726135 3663067 3774031 3706499 3734853 3760400 3695664
## [3628] 3732820 3774031 3807959 3723960 3707940 3717005 3767823 3693368 3753314
## [3637] 3798647 3737867 3742283 3738770 3818995 3789433 3787167 3710655 3793702
## [3646] 3816023 3748867 3790163 3789433 3742283 3776691 3823120 3678432 3778388
## [3655] 3775815 3789433 3796069 3789433 3796069 3823120 3823120 3760477 3775815
## [3664] 3817661 3812123 3826658 3814374 3823120 3760477 3832211 3812123 3832211
## [3673] 3760477 3823120 3806938 3836491 3863424 3824194 3814374 3843957 3823120
## [3682] 3823120 3863424 3872209 3847527 3826274 3704074 3830110 3709364 3850593
## [3691] 3780740 3821694 3837863 3870526 3870526 3870526 3869478 3870526 3856725
## [3700] 3856645 3793090 3870526 3856645 3870526 3857312 3870526 3870526 3798053
## [3709] 3793090 3856645 3870526 3856645 3870526 3870526 3770493 3905666 3870526
## [3718] 3856645 3855935 3870526 3815857 3904561 3856645 3846285 3905666 3856645
## [3727] 3905666 3905666 3860001 3856645 3905073 3846285 3870526 3891706 3891706
## [3736] 3891706 3891706 3905666 3885545 3893819 3905666 3905666 3905666 3862598
## [3745] 3905666 3922278 3895169 3922278 3922278 3862598 3905666 3922278 3905876
## [3754] 3847004 3892255 3922278 3922278 3894813 3905821 3892255 3922278 3922278
## [3763] 3906730 3922278 3922278 3922278 3922278 3922278 3886006 3922278 3922278
## [3772] 3922278 3920920 3885356 3922278 3724632 3645828 3653077 3690080 3694715
## [3781] 3671186 3644130 3663063 3724632 3802986 3705335 3792537 3644130 3727775
## [3790] 3776464 3792537 3668108 3802986 3783458 3673747 3792537 3799998 3680844
## [3799] 3792537 3783458 3785556 3735040 3788268 3836323 3846010 3836323 3783458
## [3808] 3778661 3836323 3788268 3836323 3793235 3836323 3852944 3851272 3807370
## [3817] 3807370 3836323 3794244 3812962 3836323 3858330 3848384 3833621 3829658
## [3826] 3858330 3858330 3912989 3851272 3851272 3847885 3851272 3833727 3860186
## [3835] 3868243 3850290 3858330 3819652 3819652 3858330 3870584 3822319 3849708
## [3844] 3833851 3799481 3833851 3860186 3870584 3881443 3860186 3870584 3870584
## [3853] 3828125 3870584 3881443 3870584 3833851 3854289 3805437 3873819 3881443
## [3862] 3881443 3873819 3881611 3881443 3885693 3864900 3867738 3879728 3885693
## [3871] 3885693 3851020 3881443 3885693 3881443 3868913 3869474 3881443 3915774
## [3880] 3881443 3873308 3885693 3915774 3914005 3887089 3896640 3883400 3876018
## [3889] 3907833 3908767 3909858 3915774 3908767 3883037 3915774 3908767 3915774
## [3898] 3915774 3880798 3883037 3915774 3871808 3887707 3900413 3915774 3915774
## [3907] 3889288 3915774 3892192 3915774 3915774 3915774 3915774 3915774 3898005
## [3916] 3915774 3892331 3892192 3901197 3895926 3915774 3915774 3895872 3915774
## [3925] 3902295 3915774 3740121 3917191 3900822 3889855 3895872 3852186 3917191
## [3934] 3895872 3852125 3917191 3748706 3916817 3814800 3917191 3821779 3820034
## [3943] 3922604 3922604 3855274 3896607 3862244 3844025 3838084 3917191 3866703
## [3952] 3844025 3838084 3922604 3919926 3862244 3862244 3922604 3922604 3922604
## [3961] 3867667 3899198 3922126 3862725 3863782 3899198 3846244 3899149 3927920
## [3970] 3898539 3899149 3922604 3912057 3871638 3928741 3855433 3928741 3928741
## [3979] 3875118 3928741 3921307 3927920 3927371 3891022 3810379 3881007 3873484
## [3988] 3889820 3885620 3744628 3647261 3663090 3633231 3702138 3725244 3669455
## [3997] 3710745 3764721 3727540 3741619 3764721 3648482 3725244 3764721 3786929
## [4006] 3707697 3815795 3812223 3726758 3728711 3734181 3786635 3801856 3672657
## [4015] 3786929 3760611 3717862 3815795 3725244 3802988 3762551 3815795 3789696
## [4024] 3809188 3784612 3784612 3815795 3735439 3815795 3802988 3815795 3723527
## [4033] 3876386 3737133 3793243 3763541 3876386 3762000 3787678 3792760 3876386
## [4042] 3818320 3792760 3889224 3849211 3876386 3809183 3816062 3874555 3786495
## [4051] 3892180 3867269 3861428 3892180 3877088 3855553 3877088 3892180 3889988
## [4060] 3868398 3784734 3892180 3867269 3849211 3892180 3867089 3887044 3852324
## [4069] 3892180 3892180 3820016 3892180 3867089 3892180 3867089 3900300 3861556
## [4078] 3872491 3900300 3837165 3867089 3900300 3887044 3867089 3867192 3892152
## [4087] 3900300 3900300 3900300 3794038 3900300 3900300 3900300 3884786 3882858
## [4096] 3904134 3880040 3856078 3900300 3886474 3833371 3878267 3904134 3900300
## [4105] 3861543 3866304 3848098 3904134 3891675 3904134 3877241 3901947 3904134
## [4114] 3891675 3901947 3904134 3904134 3904134 3888740 3894231 3903892 3904134
## [4123] 3888414 3901947 3904134 3906204 3906204 3879886 3906204 3875161 3893990
## [4132] 3880925 3885165 3886844 3875161 3880533 3887033 3906204 3869376 3904961
## [4141] 3910594 3901568 3797344 3906204 3909395 3903199 3906747 3909395 3910764
## [4150] 3906437 3906204 3910764 3909395 3910764 3910764 3910029 3910764 3819250
## [4159] 3910764 3899228 3902702 3906437 3908201 3910764 3910764 3912972 3910764
## [4168] 3826678 3910764 3912871 3905586 3879055 3910764 3912972 3910764 3912972
## [4177] 3910764 3912871 3910764 3912972 3912972 3893355 3910764 3904258 3877773
## [4186] 3912972 3753131 3811876 3830054 3681200 3685845 3644253 3753565 3645805
## [4195] 3704558 3756578 3722850 3811876 3804262 3819892 3788140 3804262 3730402
## [4204] 3796529 3753565 3685845 3806669 3749436 3819892 3765097 3794386 3777471
## [4213] 3720284 3837269 3807289 3833731 3833731 3760405 3794386 3858909 3857111
## [4222] 3833731 3756586 3794386 3833731 3835051 3771015 3867129 3791632 3844900
## [4231] 3810355 3867129 3857111 3857111 3867129 3867129 3867129 3833731 3867129
## [4240] 3867129 3865631 3844900 3838622 3889673 3882674 3889673 3867129 3832978
## [4249] 3862693 3841654 3882340 3809103 3803686 3867129 3848302 3843027 3890451
## [4258] 3838890 3863420 3890451 3889673 3883439 3899179 3822600 3889673 3806178
## [4267] 3899179 3848302 3775948 3729979 3867129 3899179 3889673 3899179 3899179
## [4276] 3897230 3899179 3724794 3904447 3889673 3904447 3778005 3904447 3754677
## [4285] 3775948 3904447 3904447 3774642 3881220 3904447 3752070 3904447 3725051
## [4294] 3904447 3805387 3904447 3883853 3838125 3904447 3904447 3886178 3904447
## [4303] 3904447 3904447 3904116 3858526 3853747 3904447 3904447 3904447 3904447
## [4312] 3904447 3904447 3904447 3904447 3904447 3904447 3904447 3904447 3904447
## [4321] 3904447 3904447 3904447 3904447 3904447 3904447 3904447 3904447 3904447
## [4330] 3904447 3904447 3904447 3904447 3904447 3904447 3904447 3904447 3904447
## [4339] 3862776 3904447 3863542 3904447 3882340 3904447 3904447 3904447 3904447
## [4348] 3863542 3904447 3897478 3897478 3904447 3904422 3904447 3904447 3904447
## [4357] 3904447 3904447 3904447 3904447 3904447 3904447 3904447 3904447 3904447
## [4366] 3904447 3893752 3904447 3819877 3920219 3904447 3942350 3880439 3952065
## [4375] 3904447 3947907 3942350 3942350 3904447 3952065 3810355 3952065 3942350
## [4384] 3906517 3947907 3721133 3762608 3677414 3762608 3615749 3600072 3701785
## [4393] 3735848 3719073 3599854 3747184 3724554 3671780 3801920 3707520 3765419
## [4402] 3728396 3724794 3736767 3801920 3722729 3801920 3801920 3753147 3735518
## [4411] 3712743 3750296 3788429 3726770 3722786 3799250 3662230 3739952 3772120
## [4420] 3727210 3747449 3732365 3801920 3749239 3722186 3814439 3801208 3727210
## [4429] 3833983 3725172 3841985 3747542 3767492 3747327 3761245 3833983 3833983
## [4438] 3833983 3796279 3761245 3833983 3762578 3839018 3796279 3833983 3828142
## [4447] 3839018 3833983 3839018 3833983 3757742 3839018 3839018 3839018 3839018
## [4456] 3839018 3839018 3839018 3839018 3839018 3839018 3839018 3833983 3833983
## [4465] 3839018 3839018 3839018 3839018 3839018 3839018 3839018 3839018 3839018
## [4474] 3839018 3839018 3839018 3839018 3839018 3839018 3839018 3839018 3839018
## [4483] 3839018 3839018 3839018 3839018 3839018 3839018 3766498 3839018 3839018
## [4492] 3839018 3839018 3839018 3839018 3690238 3839018 3766498 3839018 3839018
## [4501] 3839018 3839018 3839018 3839018 3839018 3839018 3771413 3839018 3839018
## [4510] 3839018 3839018 3839018 3839018 3839018 3833235 3847322 3839018 3839018
## [4519] 3847322 3845938 3847322 3839018 3839018 3839018 3847322 3859585 3839018
## [4528] 3845938 3839018 3839018 3833633 3839018 3839018 3859585 3839018 3839018
## [4537] 3859585 3859585 3859585 3859585 3859585 3839054 3839018 3859585 3859585
## [4546] 3859585 3859585 3859585 3859585 3838981 3859585 3735111 3859585 3859585
## [4555] 3859585 3859585 3859585 3859585 3859585 3859585 3859585 3859585 3859585
## [4564] 3859585 3797338 3859585 3855042 3859585 3811686 3859585 3861766 3870691
## [4573] 3859585 3855213 3798545 3870691 3870594 3755147 3870691 3824327 3870691
## [4582] 3870691 3870691 3870691 3870691 3870691 3870691 3870691 3870691 3870691
## [4591] 3870691 3870691 3870691 3870691 3791320 3870691 3870691 3870691 3870691
## [4600] 3870691 3870691 3870691 3782258 3870691 3870691 3870691 3791485 3870691
## [4609] 3870691 3847405 3870691 3870691 3847405 3870691 3870691 3870691 3870691
## [4618] 3847405 3847405 3864741 3870691 3870691 3870691 3870691 3870726 3870691
## [4627] 3870691 3870691 3870691 3870691 3876721 3870691 3870691 3870691 3877399
## [4636] 3855213 3877936 3813974 3877936 3857879 3858339 3813974 3870691 3870691
## [4645] 3870616 3882047 3875734 3870691 3813974 3882047 3882047 3838635 3882047
## [4654] 3838635 3882047 3882047 3870691 3882047 3817653 3882047 3882047 3863308
## [4663] 3882047 3882047 3882047 3882047 3882047 3882047 3882047 3882047 3882047
## [4672] 3871636 3882047 3882047 3778927 3882047 3755635 3881928 3876874 3882047
## [4681] 3882047 3877562 3882047 3627310 3675855 3624575 3694875 3777598 3816501
## [4690] 3585611 3624575 3754415 3819552 3819552 3585611 3675855 3694875 3760173
## [4699] 3730473 3758054 3778021 3788296 3819552 3819552 3743275 3729842 3748571
## [4708] 3825280 3805428 3736911 3824072 3748144 3748144 3764696 3688706 3736911
## [4717] 3741780 3812866 3822054 3825280 3793567 3822054 3764696 3700511 3823925
## [4726] 3834505 3838941 3825280 3838941 3838941 3691157 3822362 3776015 3800165
## [4735] 3812866 3838941 3822054 3835880 3792669 3800165 3838509 3740670 3821105
## [4744] 3818424 3838941 3835030 3839463 3822054 3798519 3813403 3838941 3838941
## [4753] 3844773 3824318 3822054 3820461 3768673 3826448 3844773 3844773 3844773
## [4762] 3844773 3808308 3844357 3836433 3869405 3861492 3869405 3869405 3844773
## [4771] 3869405 3844773 3844773 3869405 3844773 3869405 3844773 3844773 3844773
## [4780] 3855093 3869405 3869405 3844773 3844773 3869405 3873777 3869405 3844773
## [4789] 3873777 3844773 3869405 3869405 3869405 3878917 3869405 3714750 3878917
## [4798] 3878917 3869405 3873501 3878917 3813277 3878917 3878917 3873777 3878917
## [4807] 3878917 3858196 3838611 3878917 3878917 3838611 3873392 3869310 3873777
## [4816] 3854294 3878917 3859431 3845132 3856775 3860855 3859043 3861804 3843877
## [4825] 3878917 3878917 3861804 3861611 3878917 3744006 3869318 3861611 3837874
## [4834] 3861804 3861804 3853592 3878917 3863956 3867487 3861804 3861589 3864736
## [4843] 3843877 3861287 3858654 3861642 3873900 3872753 3864736 3861033 3858654
## [4852] 3878917 3861804 3878917 3873900 3878917 3856582 3876458 3869799 3861804
## [4861] 3876957 3890705 3869878 3871900 3890705 3877030 3890705 3869878 3789479
## [4870] 3886953 3890705 3887568 3890705 3843324 3890705 3890705 3883635 3876957
## [4879] 3751453 3699776 3689056 3797905 3667210 3691354 3797905 3640833 3664440
## [4888] 3667258 3703743 3689223 3796907 3691354 3682395 3776483 3739752 3797905
## [4897] 3658876 3797905 3775799 3797905 3832061 3797905 3788725 3794132 3799154
## [4906] 3759224 3779669 3788725 3797799 3735778 3775799 3810376 3800814 3809862
## [4915] 3797905 3797905 3779669 3797905 3813220 3790146 3797905 3819372 3790146
## [4924] 3788725 3835197 3835197 3810376 3806155 3825259 3821239 3789996 3809676
## [4933] 3806000 3825259 3835197 3819553 3789996 3835197 3770714 3828808 3828808
## [4942] 3835197 3825259 3765458 3796149 3836745 3834912 3835197 3825259 3838651
## [4951] 3753990 3779939 3835197 3756558 3838928 3838928 3835570 3785516 3835273
## [4960] 3827501 3827501 3808642 3818798 3845143 3845143 3802070 3859525 3832470
## [4969] 3827501 3859525 3806237 3775862 3830344 3808731 3842279 3838928 3763284
## [4978] 3808731 3858566 3706336 3869653 3802070 3842408 3743002 3828468 3809486
## [4987] 3802469 3805258 3802070 3869653 3869653 3826774 3839144 3807285 3832940
## [4996] 3869653 3845377 3837566 3832940 3868819 3830177 3870913 3839315 3869653
## [5005] 3870913 3837566 3809486 3866326 3870913 3850770 3870913 3870913 3874014
## [5014] 3838194 3808328 3837461 3749704 3874014 3838801 3874014 3870913 3818612
## [5023] 3882226 3766039 3838801 3808328 3831992 3828005 3882226 3831992 3882226
## [5032] 3870818 3889539 3870421 3889539 3833950 3837519 3840767 3889539 3829870
## [5041] 3890842 3812718 3852737 3840691 3890842 3882226 3845341 3840767 3900841
## [5050] 3900841 3827559 3845341 3875733 3716359 3890842 3835951 3900841 3870704
## [5059] 3732674 3879847 3878307 3894191 3838175 3905187 3899597 3848979 3905187
## [5068] 3888533 3905187 3888533 3912242 3905187 3912242 3905187 3903750 3912242
## [5077] 3889671 3912242 3868677 3910813 3903750 3785013 3912242 3906414 3878693
## [5086] 3910267 3878693 3912632 3900841 3764652 3764652 3763263 3604787 3740623
## [5095] 3604787 3761952 3733188 3640279 3670288 3717208 3594956 3777256 3740623
## [5104] 3803223 3745158 3785272 3717208 3712310 3640279 3803223 3722439 3713476
## [5113] 3801148 3718889 3803223 3796780 3803223 3727616 3786529 3801148 3704428
## [5122] 3803223 3745100 3875637 3660375 3786297 3803223 3786297 3803223 3810166
## [5131] 3810166 3786529 3769869 3810166 3812679 3732095 3786529 3788432 3812679
## [5140] 3806061 3812147 3814200 3812679 3747356 3785267 3814200 3796476 3818929
## [5149] 3786763 3812777 3796476 3801610 3784052 3824583 3829740 3818929 3831731
## [5158] 3840857 3749610 3841172 3868275 3821420 3828857 3816180 3831731 3841172
## [5167] 3759657 3819953 3841172 3819748 3786317 3867560 3814875 3811797 3841172
## [5176] 3867560 3828883 3839955 3839970 3867503 3818514 3811779 3807144 3867560
## [5185] 3812231 3786864 3827877 3805534 3839970 3809569 3806443 3786864 3786268
## [5194] 3839391 3824044 3795346 3867560 3802715 3867560 3851135 3758863 3837115
## [5203] 3867560 3867560 3867560 3867560 3851135 3867560 3867560 3837115 3867560
## [5212] 3851135 3867560 3867560 3867560 3867560 3867560 3867560 3867560 3867560
## [5221] 3867560 3867560 3812898 3867560 3867560 3867560 3867560 3867560 3867560
## [5230] 3867560 3867560 3867560 3867560 3867560 3867560 3867560 3867560 3867560
## [5239] 3867560 3867560 3867560 3871531 3867560 3867560 3867560 3867560 3871531
## [5248] 3867560 3876473 3876473 3867560 3867465 3867560 3867560 3867560 3867560
## [5257] 3867560 3876473 3867560 3889243 3867560 3867560 3889243 3867560 3867560
## [5266] 3868822 3892781 3892781 3867560 3867560 3887809 3876473 3876473 3802716
## [5275] 3888181 3870827 3867560 3889243 3888398 3856268 3878920 3899705 3867659
## [5284] 3867560 3875237 3880152 3843648 3899705 3878920 3710953 3899705 3851721
## [5293] 3876600 3899705 3899705 3905970 3899705 3884302 3899705 3907828 3899705
## [5302] 3889580 3828513 3912204 3798451 3899705 3899705 3901124 3899705 3899705
## [5311] 3899705 3912204 3786770 3695044 3683916 3708800 3739518 3654204 3657655
## [5320] 3730719 3786770 3711607 3708800 3795489 3734727 3760993 3687910 3698563
## [5329] 3786770 3777662 3678883 3786770 3699119 3698563 3698563 3786770 3765346
## [5338] 3699119 3782874 3757645 3837737 3657655 3837737 3699119 3837737 3678978
## [5347] 3701418 3753956 3837737 3699119 3678978 3699119 3709941 3837737 3837737
## [5356] 3699119 3752412 3837737 3797295 3785517 3837737 3810740 3670915 3664531
## [5365] 3837737 3778593 3752412 3785517 3853590 3832618 3794626 3754845 3797295
## [5374] 3952249 3873193 3803601 3755455 3785856 3797295 3747851 3784915 3730602
## [5383] 3752097 3730410 3731512 3797295 3797295 3797295 3825469 3784544 3718934
## [5392] 3784544 3737755 3853672 3744664 3724490 3797295 3820930 3745426 3717702
## [5401] 3750862 3744664 3752387 3799077 3819342 3837898 3827248 3819342 3837898
## [5410] 3734541 3797295 3798251 3867383 3820930 3771341 3820930 3820930 3787482
## [5419] 3820930 3833252 3820930 3810664 3819342 3819342 3820930 3819342 3820930
## [5428] 3819342 3786573 3839010 3769680 3819342 3841857 3820930 3839690 3769680
## [5437] 3843432 3839263 3855819 3833252 3839263 3839010 3839263 3839263 3855819
## [5446] 3850110 3834191 3835733 3839263 3834191 3840014 3850110 3840014 3844585
## [5455] 3839690 3841857 3874980 3863282 3849424 3839263 3860771 3839690 3888885
## [5464] 3850410 3830816 3860866 3859771 3830816 3888885 3855342 3888885 3874980
## [5473] 3846853 3888885 3811752 3846853 3874980 3888885 3880570 3835005 3786256
## [5482] 3888885 3801052 3880570 3848892 3888885 3868533 3888885 3880570 3813061
## [5491] 3880570 3773695 3888885 3888885 3888885 3888885 3888885 3888885 3859830
## [5500] 3880570 3888885 3880570 3880570 3880570 3802710 3888885 3888885 3888885
## [5509] 3880570 3888885 3888885 3888885 3848813 3888885 3873332 3888885 3880570
## [5518] 3888885 3888885 3888885 3880570 3787167 3888885 3888885 3828781 3888885
## [5527] 3888885 3880570 3898155 3888885 3831577 3774835 3888885 3888885 3913427
## [5536] 3888356 3888885 3847004 3888885 3913427 3811369 3681724 3888885 3770996
## [5545] 3885336 3887221 3914857 3931538 3913930 3887221 3931538 3923189 3931538
## [5554] 3922776 3906428 3931538 3889771 3914857 3931538 3905149 3924446 3928965
## [5563] 3737578 3915449 3903723 3905286 3931538 3930843 3915449 3894268 3912924
## [5572] 3897566 3925681 3941299 3903585 3905286 3941299 3925904 3894268 3902678
## [5581] 3904325 3904148 3941299 3871452 3941299 3945360 3951259 3941299 3858005
## [5590] 3906306 3918031 3945360 3791776 3918031 3896724 3938632 3940407 3951259
## [5599] 3951259 3931326 3910064 3955200 3968389 3921774 3903205 3910949 3951259
## [5608] 3963499 3968021 3951231 3948705 3951259 3956564 3952249 3779840 3968021
## [5617] 3910949 3939998 3951259 3934930 3965510 3965044 3954775 3945204 3875002
## [5626] 3951496 3962529 3951496 3951496 3910040 3965044 3919608 3865549 3951058
## [5635] 3846247 3951496 3900432 3885688 3951496 3893402 3951496 3732487 3628462
## [5644] 3806631 3661656 3765245 3628462 3797873 3797498 3732048 3701550 3797873
## [5653] 3797873 3762446 3684323 3793649 3686282 3762446 3625681 3676165 3769021
## [5662] 3644936 3680044 3797873 3767822 3701550 3747966 3797873 3710162 3811020
## [5671] 3797873 3702008 3800409 3800409 3813997 3800409 3797314 3797115 3796560
## [5680] 3813997 3725266 3813997 3797873 3710162 3801984 3745656 3816212 3813997
## [5689] 3819789 3800409 3765346 3762892 3724782 3831041 3762833 3831041 3813997
## [5698] 3845572 3832243 3845572 3845572 3845572 3762892 3832812 3820931 3767948
## [5707] 3838839 3819789 3845085 3844156 3845572 3827230 3845572 3820931 3825706
## [5716] 3833682 3845572 3845572 3854028 3844679 3785994 3833682 3845572 3820931
## [5725] 3845572 3810168 3816233 3829097 3845572 3778564 3845572 3854028 3825718
## [5734] 3860637 3812831 3854028 3863799 3817058 3810739 3854028 3736929 3854211
## [5743] 3806753 3827255 3812831 3797554 3858940 3887160 3863799 3887160 3863799
## [5752] 3887160 3885518 3887160 3807664 3887160 3820939 3887160 3887160 3849075
## [5761] 3887160 3887160 3854869 3872206 3855792 3853586 3790847 3887160 3887160
## [5770] 3835263 3790678 3835263 3907054 3896014 3757244 3804870 3907054 3858672
## [5779] 3789137 3830864 3907054 3818172 3915691 3908241 3849097 3840974 3811903
## [5788] 3919422 3928214 3919422 3921474 3897419 3919908 3919422 3919422 3932420
## [5797] 3916535 3933605 3941954 3941954 3922189 3935805 3923170 3931728 3919422
## [5806] 3950273 3933605 3941954 3950273 3950273 3941954 3930534 3933605 3950273
## [5815] 3918943 3950273 3950273 3933368 3950273 3950273 3905198 3933410 3923888
## [5824] 3950273 3937661 3905198 3930534 3935007 3938564 3950273 3936933 3905198
## [5833] 3941954 3950273 3941954 3937606 3905198 3921474 3936933 3941954 3856644
## [5842] 3828764 3786841 3929983 3786770 3744233 3762608 3811876 3890705 3952065
## [5851] 3884613 3676061 3656053 3691173 3675775 3674321 3680223 3694622 3698696
## [5860] 3702013 3694622 3694622 3698898 3702729 3704285 3747399 3731673 3704083
## [5869] 3752991 3708863 3753058 3752523 3772192 3760710 3749297 3771255 3767781
## [5878] 3760615 3786789 3766513 3792943 3794361 3779717 3777838 3788655 3793964
## [5887] 3793415 3792598 3795108 3799664 3789398 3800010 3793842 3799322 3789398
## [5896] 3800010 3801718 3793415 3800918 3795824 3794740 3800231 3799684 3802154
## [5905] 3799322 3795745 3812760 3806934 3811367 3818829 3810824 3795745 3803019
## [5914] 3815264 3811034 3810824 3818174 3812338 3807201 3815645 3824356 3814452
## [5923] 3819088 3800567 3840687 3820802 3824356 3842306 3839129 3840792 3842302
## [5932] 3839592 3828399 3843498 3849309 3847454 3843500 3848524 3846772 3848849
## [5941] 3849837 3846628 3849119 3855461 3851843 3849997 3855917 3856521 3849119
## [5950] 3852722 3844066 3860801 3856114 3856610 3860882 3869490 3863005 3863492
## [5959] 3874096 3867123 3865780 4054156 4118611 4090950 4084407 4090950 4075153
## [5968] 4101382 4118611 4088346 4083185 4094513 4077361 4074696 4075007 4038976
## [5977] 4054257 4075732 4074787 4098711 4085011 4097823 4038976 4089882 4075732
## [5986] 4078148 4078148 4078148 4078148 4078148 4078148 4078148 4078148 4078157
## [5995] 4078157 4078157 4078157 4079907 4079907 4079907 4079907 4079907 4079907
## [6004] 4078934 4078934 4078934 4078934 4078934 4078934 4080411 4080411 4080411
## [6013] 4080411 4080411 4080411 4080411 4080411 4080411 4080411 4080411 4080411
## [6022] 4080411 4080411 4074787 4074787 4074787 4074787 4074787 4074787 4075328
## [6031] 4075328 4075328 4075328 4075328 4075381 4077244 4077244 4077244 4077244
## [6040] 4077244 4077244 4077244 4093876 4093876 4093876 4093876 4093876 4093876
## [6049] 4093876 4093876 4093876 4111323 4112417 4112417 4115962 4115962 4115962
## [6058] 4104410 4104410 4104410 4104410 4104410 4104410 4104410 4104410 4104410
## [6067] 4104410 4104410 4104410 4104410 4104410 4104410 4104410 4104410 4079074
## [6076] 4079074 4080081 4080081 4080081 4080081 4080081 4080081 4080081 4080480
## [6085] 4080480 4080480 4080480 4080480 4080480 4080480 4080480 4080480 4085011
## [6094] 4085011 4085011 4085011 4086014 4086014 4086014 4087959 4088060 4088060
## [6103] 4088346 4088346 4088346 4088346 4088346 4088346 4093526 4093526 4093526
## [6112] 4093526 4093526 4093526 4093526 4093526 4093526 4093526 4093526 4093526
## [6121] 4093526 4093526 4093526 4093526 4093526 4093526 4093526 4094880 4094880
## [6130] 4094880 4082229 4082229 4082229 4082229 4082229 4082229 4082235 4082235
## [6139] 4082808 4082808 4082808 4082853 4082853 4082853 4083185 4083185 4083185
## [6148] 4087081 4087081 4087081 4087081 4087291 4089965 4089965 4089965 4089965
## [6157] 4092677 4094214 4094214 4094214 4094214 4094214 4094441 4072004 4072004
## [6166] 4072004 4072004 4072242 4072242 4072242 4072242 4072242 4072242 4072242
## [6175] 4074280 4075514 4075514 4068920 4068920 4068954 4068954 4068954 4068954
## [6184] 4069015 4069015 4069015 4069015 4069015 4074696 4074696 4074696 4074696
## [6193] 4074696 4072251 4072251 4072251 4072251 4072251 4072251 4072251 4072251
## [6202] 4072251 4072251 4072251 4072251 4072251 4072251 4072251 4072251 4072251
## [6211] 4072251 4072251 4072251 4072251 4072251 4072251 4072251 4072251 4072251
## [6220] 4072251 4072251 4072251 4072251 4072251 4072251 4072251 4072251 4075732
## [6229] 4075732 4075732 4075732 4075732 4075732 4075732 4075732 4075732 4075732
## [6238] 4075732 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934
## [6247] 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934
## [6256] 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934
## [6265] 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934
## [6274] 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4070676
## [6283] 4070676 4070676 4074768 4074787 4074787 4074787 4074787 4074787 4074787
## [6292] 4074787 4074787 4074787 4074787 4074787 4074787 4074787 4074787 4074787
## [6301] 4074787 4074787 4074787 4074787 4074787 4074787 4074787 4074787 4074787
## [6310] 4074787 4074787 4074787 4074787 4074787 4074787 4074787 4074787 4074787
## [6319] 4074787 4074787 4074787 4074787 4074787 4074787 4074787 4097823 4097823
## [6328] 4097823 4097823 4097823 4097823 4097823 4097823 4097823 4097823 4097823
## [6337] 4097823 4097823 4097823 4097823 4097823 4097823 4097823 4097823 4097823
## [6346] 4097823 4098711 4044730 4044730 4044847 4044847 4044847 4044847 4044847
## [6355] 4044847 4052704 4088667 4088667 4088667 4088667 4092737 4092737 4092737
## [6364] 4092737 4092737 4092737 4092737 4092737 4092737 4092737 4092737 4092737
## [6373] 4092737 4092737 4092737 4092737 4092737 4092737 4092737 4092737 4092737
## [6382] 4092737 4092737 4092737 4092737 4092737 4092737 4092737 4092737 4092737
## [6391] 4092737 4092737 4092737 4093876 4093876 4093876 4093876 4093876 4093876
## [6400] 4093876 4093876 4093876 4093876 4090405 4090405 4090405 4090405 4090405
## [6409] 4090405 4090405 4090405 4090405 4090405 4090405 4090405 4090405 4090405
## [6418] 4090405 4090405 4090405 4090405 4090405 4094639 4094639 4094639 4094639
## [6427] 4094639 4094639 4094639 4094639 4094639 4094639 4094639 4094639 4094639
## [6436] 4094639 4094639 4102965 4102965 4102965 4102965 4102965 4102965 4102965
## [6445] 4038976 4038976 4038976 4038976 4038976 4038976 4038976 4038976 4038976
## [6454] 4038976 4038976 4038976 4038976 4038976 4038976 4039730 4039730 4039730
## [6463] 4039730 4039730 4039981 4039981 4039981 4039981 4078234 4078234 4078234
## [6472] 4078234 4078234 4078234 4078234 4078234 4078234 4078234 4078234 4078234
## [6481] 4078234 4078234 4078234 4078234 4078234 4078234 4078234 4078234 4078234
## [6490] 4078234 4078234 4078234 4078234 4078234 4078541 4078541 4079074 4079074
## [6499] 4079074 4079074 4079074 4079074 4079074 4079074 4082150 4082150 4082150
## [6508] 4082150 4082501 4082501 4082501 4082501 4082501 4082501 4082501 4082501
## [6517] 4082501 4082501 4082501 4083104 4083104 4083104 4083104 4083104 4083104
## [6526] 4084481 4084481 4084481 4084481 4084481 4084481 4085011 4085011 4085011
## [6535] 4085011 4085011 4085011 4085011 4085011 4089882 4089882 4089882 4089882
## [6544] 4091105 4091105 4091105 4091105 4091105 4091105 4091105 4091105 4091105
## [6553] 4091105 4091105 4091105 4091105 4091105 4091105 4091105 4091105 4091105
## [6562] 4091105 4091105 4093452 4093452 4093452 4093452 4093452 4093452 4093526
## [6571] 4093526 4093526 4093526 4093526 4093526 4093526 4093526 4093526 4093526
## [6580] 4093526 4093526 4093526 4079250 4079250 4079250 4079250 4079250 4079250
## [6589] 4079250 4079250 4079250 4079250 4079250 4079250 4079250 4079250 4079250
## [6598] 4079250 4079250 4079250 4079250 4079250 4079250 4079250 4079250 4079250
## [6607] 4079250 4079250 4079737 4079737 4079737 4079737 4079737 4079737 4079737
## [6616] 4079737 4079737 4079737 4079737 4079737 4082229 4082229 4082229 4082229
## [6625] 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332
## [6634] 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332
## [6643] 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332
## [6652] 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332
## [6661] 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4081332
## [6670] 4081332 4081332 4072004 4072004 4072004 4072004 4072004 4072004 4072004
## [6679] 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004
## [6688] 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004
## [6697] 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004
## [6706] 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004
## [6715] 4072004 4072004 4072004 4072004 4072004 4072004 4068920 4068920 4068920
## [6724] 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920
## [6733] 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920
## [6742] 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920
## [6751] 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920
## [6760] 4068920 4068920 4068920 4061671 4061671 4061671 4072251 4072251 4072251
## [6769] 4072251 4072251 4072251 4072251 4072251 4072251 4072251 4072251 4072251
## [6778] 4072251 4072251 4072251 4072251 4072251 4078934 4078934 4078934 4078934
## [6787] 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934 4078934
## [6796] 4078934 4078934 4078934 4078934 4078934 4070676 4070676 4070676 4070676
## [6805] 4070676 4070676 4070676 4070676 4070676 4070676 4070676 4070676 4070676
## [6814] 4070676 4070676 4070676 4070676 4070676 4070676 4083707 4083707 4083707
## [6823] 4083707 4096431 4096431 4097823 4097823 4097823 4097823 4097823 4097823
## [6832] 4097823 4097823 4097823 4097823 4097823 4097823 4097823 4097823 4097823
## [6841] 4097823 4097823 4035508 4035508 4040159 4040159 4044730 4044730 4044730
## [6850] 4044730 4044730 4044730 4044730 4044730 4044730 4044730 4044730 4044730
## [6859] 4044730 4044730 4044730 4044730 4044730 4044730 4073760 4073760 4073760
## [6868] 4073760 4088667 4088667 4088667 4088667 4088667 4088667 4088667 4088667
## [6877] 4088667 4088667 4088667 4088667 4088667 4088667 4075007 4075007 4075007
## [6886] 4075007 4075007 4075007 4075007 4075007 4075007 4075007 4090405 4090405
## [6895] 4090405 4090405 4090405 4090405 4090405 4038976 4038976 4038976 4038976
## [6904] 4038976 4038976 4038976 4038976 4038976 4038976 4038976 4038976 4038976
## [6913] 4038976 4038976 4038976 4038976 4038976 4038976 4060830 4060830 4060830
## [6922] 4060830 4060830 4060830 4060830 4060830 4060830 4060830 4060830 4060830
## [6931] 4060830 4060830 4060830 4078234 4078234 4078234 4078234 4078234 4067725
## [6940] 4067725 4067725 4067725 4067725 4067725 4080200 4080200 4080200 4080200
## [6949] 4080200 4082150 4082150 4082150 4082150 4082150 4082150 4082150 4082150
## [6958] 4082150 4082150 4082150 4089882 4089882 4089882 4089882 4089882 4089882
## [6967] 4089882 4089882 4089882 4089882 4089882 4089882 4089882 4089882 4089882
## [6976] 4089882 4089882 4089882 4067588 4067588 4067588 4067588 4067588 4067588
## [6985] 4079250 4079250 4079250 4079250 4079250 4079250 4079250 4079250 4079250
## [6994] 4079250 4079250 4079250 4079250 4079250 4079250 4079250 4065227 4065227
## [7003] 4065227 4075024 4075024 4075024 4080228 4080228 4080228 4080228 4080266
## [7012] 4081332 4081332 4081332 4081332 4081332 4081332 4081332 4072004 4072004
## [7021] 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4072004
## [7030] 4072004 4072004 4072004 4072004 4072004 4072004 4072004 4054257 4054257
## [7039] 4054257 4054257 4054257 4054257 4068920 4068920 4068920 4068920 4068920
## [7048] 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920 4068920
## [7057] 4068920 4068920 4068920 3890752 3869349 3936324 3680611 3832373 3869349
## [7066] 3797641 3894659 3731283 4049594 4037148 3862568 3837747 3883684 4037148
## [7075] 4053440 4037148 3941801 3794562 3961048 4029430 4054813 4037148 4015924
## [7084] 4038882 4002294 4002739 4019890 3988173 4055328 4002294 3950566 3965071
## [7093] 4019890 4055328 3992493 4019890 3994858 4020995 3940428 3989587 3993519
## [7102] 4020995 4008936 4055328 4042488 4028193 3899577 3843603 3992563 3992563
## [7111] 3758321 3892149 3864954 3795698 3870815 3992563 3866845 3992563 3981686
## [7120] 3994237 3939310 3958213 3991347 3931288 3870815 3992563 3938528 4023670
## [7129] 4022191 4063681 4023670 3878594 4008594 4063681 3958173 4063581 4002998
## [7138] 4057654 3981686 4063681 4063681 4015202 4045773 3958174 3931381 4063681
## [7147] 4011013 4021660 4004975 4022763 3974492 3971094 3657221 3982160 3985983
## [7156] 3642122 3939482 3739878 3655416 3655416 3985983 3975120 3916077 3761804
## [7165] 3985983 3759043 3975120 3878055 3985983 3948152 3964668 3964668 3931835
## [7174] 3982447 3872873 3911756 3980808 3911756 3966774 3985983 3979268 3977402
## [7183] 3985983 3985983 3968132 3912617 4051185 4027805 3926085 3944578 4064893
## [7192] 3933578 4064893 4036285 4037193 4013431 3991549 3949053 3917241 3776870
## [7201] 3743518 3865976 3927805 3935867 3721767 3979922 3979922 3793709 3853864
## [7210] 4037715 3922558 3924084 3927805 3882635 3892212 4037715 3894059 4037177
## [7219] 3888587 3894059 3950408 3989049 4027294 3876263 3922558 3864584 3951923
## [7228] 3840229 3879654 4011554 4037715 4008260 3892786 3989049 3969439 4027294
## [7237] 4037715 3988292 4042257 3948687 3954252 3757683 3846802 3776644 3799097
## [7246] 4006894 4026900 3981204 3796956 3840737 3773564 3952348 3951029 4017414
## [7255] 3944441 3994476 4026900 3843882 4033782 4003382 3980700 4026900 4026900
## [7264] 3902809 3958758 3980784 4026900 4018692 3955175 4059743 4020220 4012673
## [7273] 3949272 3934015 4020841 4035508 3980309 4005096 3936755 4023631 4028058
## [7282] 3649524 3913753 3766467 3767229 3971620 4046573 3773572 3773572 3811468
## [7291] 3865781 3858397 4050622 3845764 3971001 3990120 3952225 3900900 4026986
## [7300] 3967006 3935708 3798966 3990120 3901154 4050622 3921241 3939482 3985741
## [7309] 4032892 4073760 4073760 3933510 4000642 4018921 4022407 3996320 4073760
## [7318] 4023783 4073760 4019280 4073760 3998390 4050362 4027637 4003751 4073760
## [7327] 3995017 4073760 3748536 4004725 3957127 3786039 3861954 3891201 3790936
## [7336] 4008308 3896835 3964795 4006914 4006914 4037652 3903314 3941668 4037652
## [7345] 3934786 3876310 4050462 4037652 3892655 4002926 3968161 4050462 4042748
## [7354] 4034821 4039692 4019815 4037652 4044188 4052753 3999379 3999838 4021401
## [7363] 4075007 3998501 4020443 4054399 4047044 3999379 4075007 3781215 3890752
## [7372] 3812499 3881841 4010982 3925492 3760460 3842529 3880077 4011603 3955088
## [7381] 3774597 3897481 3760460 4011603 3891120 3813027 3945538 3960350 4011603
## [7390] 3781804 3956568 3880954 3923896 4011603 3914775 3930560 4011603 3826017
## [7399] 3952633 3898505 4011603 3898903 3955148 4003090 3929907 3885042 4000777
## [7408] 3871919 4011603 3905091 4010832 4038976 3976083 3904744 3827091 3907697
## [7417] 3785733 3939579 3973082 3906537 3853446 3693531 3708800 3973082 3786332
## [7426] 3945721 3929064 3904529 3984619 3929064 3878105 3892479 3968457 3992092
## [7435] 3947280 3973082 3890207 3919517 4038364 3989348 4038364 3932768 3938461
## [7444] 4060830 3975712 4000666 3991895 4037713 4014811 3992092 4007551 4038364
## [7453] 3992092 3996701 3692133 4044402 4044769 4044402 3768072 3866472 3777498
## [7462] 3999151 3932045 4067725 3905744 4067725 4005408 3996701 4019733 4019733
## [7471] 3905744 3996701 3921527 4050642 4052292 4052292 4044402 4067725 3909954
## [7480] 3971850 4043697 4014618 3918397 3991618 4013991 4067725 4010304 4046583
## [7489] 4067725 3994519 3977320 4035312 3957195 3914534 3977150 3829137 3653880
## [7498] 4011443 3893804 3907863 3819214 4026802 3819251 3824948 3792226 4077798
## [7507] 3924170 3918406 3824948 3980836 3886500 3954895 4077798 3827432 4077798
## [7516] 3962752 4013194 4077798 4070958 4002600 3954895 4070071 4037535 4077798
## [7525] 4077798 4002087 4037224 4077798 4070958 3989033 4007554 4089634 4048223
## [7534] 4077798 3966706 3962752 3949289 4077798 3885145 4048223 4089882 4077798
## [7543] 3944980 3750913 3733924 3921471 3777296 3788256 3944980 3857047 3913832
## [7552] 4033801 4032044 3868888 4033801 3769411 4033801 3987624 4033801 3924643
## [7561] 3944430 4033801 3854834 3962310 4040857 4033801 4036712 4010192 4049752
## [7570] 4046885 4049752 4008594 4040857 3996876 3944430 4040857 4010726 4043951
## [7579] 3987472 3952080 4046885 3969759 3948837 4046418 3974463 3748625 3902053
## [7588] 3978314 3802846 3879292 3906299 3905023 3835649 4037617 4040140 3894266
## [7597] 4036641 3919066 4040140 3950652 4036641 4026196 4040140 3984677 4026196
## [7606] 3929102 4027965 4008556 3963112 4065227 3929102 4071959 4043079 3997985
## [7615] 4036796 3976542 4016453 4065227 4016906 4005836 4065227 4065227 3984364
## [7624] 4026539 4011013 4035300 4065227 4009748 3988840 4065227 4050270 4029339
## [7633] 3869025 3913943 3968862 3794442 3842825 3781307 3922606 3731186 3909817
## [7642] 3879767 3852996 3861033 3979256 3900243 3889790 3988042 3744505 3881655
## [7651] 3878078 3967933 4037580 3981810 3833140 4037580 4002123 4023403 3841896
## [7660] 4021745 4039034 3966809 3982077 3987130 4057813 3939068 3982093 3990846
## [7669] 3997639 4057813 4057813 3948780 3989902 4039034 4055563 4055610 4057813
## [7678] 4010893 3936894 3967578 3953285 3889361 3854019 3998565 3815894 3776585
## [7687] 3873384 3815669 3991767 3998565 3993296 3781310 3760386 3793390 3959840
## [7696] 3781310 4005078 3998565 3985581 3957058 3940385 3966548 4039546 3921494
## [7705] 3972128 4084417 4039546 3984702 3896655 3968337 4018766 3968881 3934340
## [7714] 4011118 3984702 4048162 3923298 4032375 4026406 3966901 4034786 4040497
## [7723] 4032418 4030858 4046736 4045200 4056980 4074954 4067272 4088446 4067030
## [7732] 4054853 4077985 4077043 4064373 4064935 4090710 4086203 4069241 4080146
## [7741] 4092823 4086044 4054853 4083300 4090710 4092823 4087965 4064935 4064935
## [7750] 4064935 4065176 4065176 4065755 4065755 4065755 4065755 4065755 4065755
## [7759] 4065755 4065755 4066752 4066752 4066752 4066752 4066752 4066752 4086203
## [7768] 4086757 4086757 4086757 4086757 4086757 4087133 4087133 4090093 4090093
## [7777] 4090093 4091034 4091034 4091034 4091034 4091034 4055822 4055822 4057303
## [7786] 4057647 4057647 4057647 4057647 4057647 4057647 4057647 4057647 4057647
## [7795] 4057647 4057647 4057647 4057647 4057647 4057647 4080633 4080633 4085046
## [7804] 4085046 4085046 4085046 4085046 4085046 4085046 4085046 4085046 4085046
## [7813] 4085046 4085046 4085046 4085046 4085046 4085046 4085046 4085046 4085046
## [7822] 4085046 4083692 4083692 4084184 4084184 4084184 4084184 4084184 4084184
## [7831] 4084184 4084184 4084184 4084184 4084184 4084184 4084184 4084184 4084184
## [7840] 4084184 4084184 4084184 4084184 4104925 4104925 4104925 4104925 4104925
## [7849] 4104925 4104925 4104925 4104925 4104925 4105372 4105372 4108392 4108392
## [7858] 4108392 4108392 4108392 4108392 4108392 4108392 4108392 4077117 4077117
## [7867] 4077117 4077117 4077117 4177659 4177659 4177659 4177659 4177659 4177659
## [7876] 4177659 4177659 4177659 4177659 4177659 4177659 4177659 4177659 4177659
## [7885] 4177659 4177659 4058625 4058625 4058625 4058625 4059047 4059047 4059047
## [7894] 4059047 4059047 4059047 4059134 4059134 4059134 4059134 4059134 4059134
## [7903] 4059171 4092506 4092752 4092752 4092752 4092752 4092752 4092752 4092752
## [7912] 4092752 4092752 4092752 4092752 4092752 4092752 4092752 4092752 4093544
## [7921] 4093544 4093544 4093544 4093544 4093544 4093544 4093544 4093544 4093544
## [7930] 4069241 4069241 4069241 4069241 4069472 4069472 4069472 4069472 4069472
## [7939] 4069472 4069476 4069476 4069476 4069476 4069476 4069476 4069476 4069476
## [7948] 4069518 4082874 4082874 4082874 4082874 4082874 4082874 4082874 4082874
## [7957] 4082874 4082874 4082874 4082874 4153355 4153355 4153355 4153355 4153355
## [7966] 4153355 4153355 4153355 4153355 4075695 4075695 4075695 4075695 4075695
## [7975] 4075695 4075695 4075927 4078013 4078038 4078038 4079319 4079319 4079319
## [7984] 4081240 4081240 4081240 4081240 4080146 4080146 4080146 4080146 4080146
## [7993] 4080146 4081052 4081187 4081187 4081505 4081505 4081505 4081505 4081505
## [8002] 4081505 4081505 4081505 4081505 4081505 4092823 4092823 4092823 4092823
## [8011] 4092823 4092823 4092823 4094154 4094154 4094154 4095738 4095738 4095738
## [8020] 4095738 4095738 4095738 4087965 4087965 4087965 4087965 4087965 4087965
## [8029] 4087965 4087965 4087965 4087965 4087965 4087965 4087965 4087965 4087965
## [8038] 4087965 4087965 4087965 4087965 4086044 4086044 4086044 4086044 4086044
## [8047] 4086044 4086044 4086044 4086044 4086044 4086044 4086044 4086044 4086982
## [8056] 4086982 4086982 4086982 4086982 4095757 4095757 4098196 4098196 4098196
## [8065] 4098196 4098196 4098196 4098196 4098196 4098196 4098196 4098196 4098196
## [8074] 4098196 4098196 4098196 4098196 4098196 4098196 4081784 4081784 4081784
## [8083] 4081784 4081784 4081784 4081784 4081784 4084597 4084597 4084597 4084597
## [8092] 4084597 4084597 4084597 4084597 4086203 4086203 4086203 4086203 4054853
## [8101] 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853
## [8110] 4054853 4054853 4055822 4055822 4055822 4055822 4055822 4055822 4055822
## [8119] 4055822 4055822 4055822 4055822 4055822 4079023 4079023 4079023 4079023
## [8128] 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023
## [8137] 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4080633
## [8146] 4080633 4080633 4080633 4080633 4080633 4080633 4080633 4080633 4080633
## [8155] 4080633 4083300 4083300 4083300 4083300 4083300 4083300 4083692 4083692
## [8164] 4083692 4083692 4083692 4083692 4083692 4083692 4083692 4083692 4083692
## [8173] 4083692 4083692 4102513 4102513 4102513 4102513 4102513 4103751 4103751
## [8182] 4103751 4103967 4103967 4103967 4103967 4103967 4103967 4103967 4103967
## [8191] 4103967 4103967 4104925 4104925 4104925 4076148 4076148 4076148 4077117
## [8200] 4077117 4077117 4077117 4077117 4077117 4077117 4077117 4077117 4077117
## [8209] 4077117 4077117 4077117 4077117 4077117 4077117 4077117 4058625 4058625
## [8218] 4058625 4058625 4058625 4058625 4058625 4058625 4058625 4058625 4058625
## [8227] 4058625 4058625 4058625 4058625 4058625 4058625 4058625 4058625 4058625
## [8236] 4058625 4058625 4058625 4058625 4058625 4058625 4058625 4090710 4090710
## [8245] 4090710 4091571 4091571 4091571 4091571 4091571 4091571 4091571 4091571
## [8254] 4091571 4091571 4091571 4091738 4091738 4091738 4091738 4091738 4091738
## [8263] 4091738 4091738 4091738 4091738 4091738 4091738 4091738 4091738 4066975
## [8272] 4066975 4066975 4066975 4066975 4066975 4066975 4066975 4066975 4066975
## [8281] 4069241 4069241 4069241 4069241 4069241 4069241 4079357 4079357 4079357
## [8290] 4079357 4079357 4079357 4079357 4079357 4079357 4082720 4082720 4082720
## [8299] 4082720 4082720 4082874 4082874 4082874 4068559 4068559 4068559 4068559
## [8308] 4068559 4068559 4068559 4068559 4068559 4068559 4068559 4068559 4068559
## [8317] 4070186 4070186 4070186 4075604 4075695 4075695 4067272 4067272 4067272
## [8326] 4067272 4067272 4068208 4068208 4068208 4068208 4068208 4068208 4073480
## [8335] 4073480 4073480 4073480 4077980 4077980 4077980 4077980 4077980 4077980
## [8344] 4079216 4079216 4079216 4079216 4079216 4079216 4079216 4079216 4080146
## [8353] 4092823 4092823 4092823 4092823 4092823 4092823 4092823 4092823 4092823
## [8362] 4092823 4092823 4092823 4092823 4092823 4092823 4092823 4092823 4092823
## [8371] 4092823 4092823 4092823 4087965 4087965 4087965 4087965 4087965 4087965
## [8380] 4087965 4087965 4087965 4087965 4087965 4087965 4087965 4087965 4087965
## [8389] 4087965 4087965 4087965 4087965 4087965 4085615 4085615 4085615 4085615
## [8398] 4085615 4085615 4085615 4085615 4085615 4085615 4085615 4085615 4085615
## [8407] 4085615 4085615 4085615 4086044 4086044 4086044 4086044 4086044 4086044
## [8416] 4092238 4092238 4092238 4092238 4092238 4092238 4092238 4092238 4092238
## [8425] 4092238 4095757 4095757 4095757 4095757 4095757 4095757 4095757 4095757
## [8434] 4095757 4095757 4095757 4095757 4095757 4064935 4064935 4064935 4064935
## [8443] 4064935 4064935 4064935 4064935 4064935 4064935 4064935 4064935 4064935
## [8452] 4064935 4064935 4064935 4064935 4064935 4064935 4064935 4064935 4064935
## [8461] 4064935 4064935 4064935 4064935 4064935 4064935 4064935 4064935 4064935
## [8470] 4064935 4064935 4064935 4064935 4064935 4064935 4064935 4064935 4064935
## [8479] 4064935 4064935 4064935 4064935 4064935 4064935 4064935 4067030 4067030
## [8488] 4067030 4067030 4067030 4067030 4081784 4081784 4081784 4081784 4081784
## [8497] 4081784 4081784 4081784 4081784 4081784 4081784 4081784 4081784 4081784
## [8506] 4081784 4081784 4081784 4081784 4081784 4081784 4081784 4081784 4081784
## [8515] 4081784 4081784 4081784 4081784 4081784 4081784 4081784 4081784 4081784
## [8524] 4081784 4081784 4081784 4081784 4054853 4054853 4054853 4054853 4054853
## [8533] 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853
## [8542] 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853
## [8551] 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853
## [8560] 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4054853
## [8569] 4054853 4054853 4054853 4054853 4054853 4054853 4054853 4079023 4079023
## [8578] 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023
## [8587] 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023
## [8596] 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023
## [8605] 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023 4079023
## [8614] 4069538 4069538 4069538 4069538 4069538 4069538 4069538 4069538 4069538
## [8623] 4069538 4069538 4069538 4083300 4083300 4083300 4083300 4083300 4083300
## [8632] 4083300 4083300 4083300 4083300 4083300 4083300 4083300 4083300 4083300
## [8641] 4083300 4083300 4083300 4083300 4083300 4083300 4088446 4088446 4088446
## [8650] 4088446 4088446 4088446 4088446 4088446 4088446 4088446 4088446 4088446
## [8659] 4101219 4101219 4101219 4101219 4101219 4101219 4101219 4101219 4102513
## [8668] 4102513 4102513 4102513 4102513 4102513 4102513 4102513 4102513 4102513
## [8677] 4102513 4102513 4102513 4102513 4102513 4102513 4102513 4102513 4102513
## [8686] 4068026 4076148 4076148 4076148 4076148 4076148 4076148 4076148 4076148
## [8695] 4076148 4076148 4076148 4076148 4076148 4076148 4076148 4076148 4076148
## [8704] 4076148 4076148 4076148 4076148 4076148 4076148 4076148 4076148 4076148
## [8713] 4076148 4076148 4076148 4076148 4058625 4058625 4058625 4058625 4058625
## [8722] 4058625 4058625 4058625 4058625 4058625 4058625 4058625 4058625 4058625
## [8731] 4058625 4058625 4058625 4058625 4058625 4058625 4058625 4058625 4058625
## [8740] 4058625 4090710 4090710 4090710 4090710 4090710 4090710 4090710 4090710
## [8749] 4090710 4090710 4090710 4090710 4090710 4090710 4090710 4090710 4090710
## [8758] 4090710 4090710 4090710 4090710 4090710 4090710 4090710 4090710 4090710
## [8767] 4090710 4090710 4090710 4090710 4090710 4090710 4090710 4090710 4090710
## [8776] 4090710 4090710 4063623 4063623 4063623 4063623 4063623 4063623 4063623
## [8785] 4063623 4063623 4063623 4063623 4063623 4063623 4063623 4063623 4063623
## [8794] 4063623 4063623 4063623 4063623 4063623 4063623 4063623 4063623 4063623
## [8803] 4063623 4067659 4067659 4079357 4079357 4079357 4079357 4079357 4079357
## [8812] 4079357 4079357 4079357 4079357 4079357 4079357 4079357 4079357 4079357
## [8821] 4079357 4079357 4079357 4079357 4079357 4079357 4079357 4079357 4079357
## [8830] 4079357 4079357 4079357 4079357 4060321 4060321 4060321 4060321 4060321
## [8839] 4068559 4068559 4068559 4068559 4068559 4068559 4068559 4068559 4068559
## [8848] 4068559 4068559 4068559 4068559 4068559 4068559 4068559 4068559 4068559
## [8857] 4068559 4068559 4067272 4067272 4067272 4067272 4067272 4067272 4067272
## [8866] 4067272 4067272 4067272 4067272 4067272 4067272 4067272 4067272 4067272
## [8875] 4067272 4067272 4067272 4067272 4067272 4067272 4067272 4067272 4067272
## [8884] 4064373 4064373 4064373 4064373 4064373 4064373 4077980 4077980 4077980
## [8893] 4077980 4077980 4077980 4077980 4077980 4077980 4077980 4077980 4077980
## [8902] 4077980 4077980 4077980 4077980 4077980 4077980 4077980 4077980 4077980
## [8911] 4076383 4076383 4076383 4076383 4076383 4076383 4092823 4092823 4092823
## [8920] 4092823 4092823 4092823 4092823 4092823 4092823 4092823 4092823 4092823
## [8929] 4092823 4092823 4092823 4092823 4092823 4092823 4073777 4073777 4073777
## [8938] 4087965 4087965 4087965 4087965 4087965 4087965 4087965 4087965 4087965
## [8947] 4087965 4087965 4087965 4087965 4087965 4087965 4087965 4087965 4077043
## [8956] 4077043 4077043 4085615 4085615 4085615 4085615 4085615 4085615 4085615
## [8965] 4085615 4085615 4085615 4085615 4085615 4085615 4085615 4085615 4085615
## [8974] 4085615 4085615 4085615 4085615 4085615 4077985 4077985 4077985 4077985
## [8983] 4077985 4077985 4077985 4077985 4077985 4077985 4077985 4077985 4077985
## [8992] 4077985 4077985 4077985 4077985 4092238 4092238 4092238 4092238 4092238
## [9001] 4092238 4092238 4025460 4057534 4050114 3874678 3990876 4025727 4052638
## [9010] 4052638 4010944 3954679 3972717 4010893 3977199 4030544 3995936 3950128
## [9019] 4054392 4039405 4054392 4053337 4079023 4049528 4079023 4028345 4000427
## [9028] 3989932 4028345 4074613 4088446 4032454 4036793 3950745 4021579 4058625
## [9037] 4015460 4011686 4090710 3983537 3859704 4012483 3840559 4067659 4058036
## [9046] 4006649 4039689 3966877 3960545 3872627 3945020 3968881 4076383 4025810
## [9055] 3994632 4073777 4042363 3965466 3989516 4032380 4062337 4090621 4082265
## [9064] 4065224 4042785 4068500 4068500 4069665 4069665 4069665 4069665 4069665
## [9073] 4069665 4069665 4060395 4055721 4055721 4055721 4055721 4055721 4055721
## [9082] 4055721 4055721 4055721 4072094 4033538 4043157 4062366 4062366 4062366
## [9091] 4062366 4062366 4062366 4062366 4062366 4062366 4022005 4012574 4051999
## [9100] 4051999 4074280 4074280 4074280 4074280 4074280 4074280 4074280 4074280
## [9109] 4074280 4074280 4074280 4078565 4080573 3969181 3984898 4055594 4055594
## [9118] 4055594 4055594 4055594 4055594 4067622 4068246 4068246 4068246 4068246
## [9127] 4080926 4080926 4080926 4080926 4002399 4006458 4053956 4053956 4055387
## [9136] 4055387 4056048 4056048 4056048 4056048 4056048 4030901 4072387 4072387
## [9145] 4072387 4072387 4072387 4037281 4065604 4065604 4065604 4065604 4065604
## [9154] 4090621 4090621 4090621 4090621 4090621 4090621 4090621 4090621 4028596
## [9163] 4018202 4055328 4055328 4055328 4055897 4090503 4090503 4090503 4090503
## [9172] 4090503 4090503 4078753 4078753 4078753 4078753 4078753 4078753 4078753
## [9181] 4078753 4078753 4078753 4078753 4051141 4064736 4064736 4064736 4064736
## [9190] 4064736 4064736 4064736 4064736 4064736 4064736 4064736 4064736 4064736
## [9199] 4064736 4064736 3977909 4003335 4060766 4060766 4005908 4087563 4087563
## [9208] 4087563 4087563 4087563 4087563 4087563 4087563 4087563 4087563 4087563
## [9217] 4054038 4072856 4072856 4072856 4072856 4072856 4072856 4072856 4072856
## [9226] 4072856 4072856 4085145 4085145 4085145 4085145 4085145 4085145 4085145
## [9235] 4085145 4085145 4085145 4085723 4063874 4063874 4063874 4069643 4069643
## [9244] 4069643 4071601 4071601 4071601 4071601 4071601 4072667 4062170 4062170
## [9253] 4062337 4062337 4062337 4062337 4062337 4062337 4062337 4062337 3797332
## [9262] 4045751 4071977 4071977
sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power))
## [1] 104506.9
mean(dataTE$Total_Power) #3,942,322 is the mean and #104,507 is the mean square error
## [1] 3942322
###rpart

tree1=rpart(dataTR$Total_Power~.,method="anova",data=dataTR,maxdepth=4,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Total_Power ~ ., data = dataTR, method = "anova", 
##     maxdepth = 4, xval = 10)
##   n= 26779 
## 
##            CP nsplit rel error    xerror        xstd
## 1  0.36278557      0 1.0000000 1.0000802 0.007211135
## 2  0.08152149      1 0.6372144 0.6373931 0.007604781
## 3  0.04833354      2 0.5556929 0.5559185 0.006623577
## 4  0.04293438      3 0.5073594 0.5076175 0.005956977
## 5  0.03201137      4 0.4644250 0.4647428 0.005546970
## 6  0.02321593      5 0.4324136 0.4327462 0.005367620
## 7  0.01519979      6 0.4091977 0.4099793 0.004957639
## 8  0.01468595      7 0.3939979 0.3952936 0.004927428
## 9  0.01354492      8 0.3793120 0.3823965 0.004866050
## 10 0.01000000      9 0.3657670 0.3665986 0.004831904
## 
## Variable importance
## Y46 Y45  Y4  Y3 X49 Y47  X8  Y7  X7  Y8  X6 X11 X12  X5 X10  Y6 Y12  X9 Y10 X42 
##  13  12  11  10  10  10   4   3   3   3   2   2   1   1   1   1   1   1   1   1 
## X44 X43  X1  X2 Y35 Y36 
##   1   1   1   1   1   1 
## 
## Node number 1: 26779 observations,    complexity param=0.3627856
##   mean=3936837, MSE=1.512747e+10 
##   left son=2 (12983 obs) right son=3 (13796 obs)
##   Primary splits:
##       Y46 < 875.95  to the right, improve=0.3627856, (0 missing)
##       Y45 < 843.24  to the right, improve=0.3548318, (0 missing)
##       Y4  < 101.5   to the left,  improve=0.3292959, (0 missing)
##       Y3  < 74.275  to the left,  improve=0.3183650, (0 missing)
##       Y47 < 912.745 to the right, improve=0.2426537, (0 missing)
##   Surrogate splits:
##       Y45 < 843.24  to the right, agree=0.966, adj=0.929, (0 split)
##       Y4  < 101.5   to the left,  agree=0.952, adj=0.900, (0 split)
##       Y3  < 74.275  to the left,  agree=0.917, adj=0.828, (0 split)
##       Y47 < 894.03  to the right, agree=0.906, adj=0.805, (0 split)
##       X49 < 42.165  to the right, agree=0.903, adj=0.800, (0 split)
## 
## Node number 2: 12983 observations,    complexity param=0.03201137
##   mean=3860471, MSE=6.426806e+09 
##   left son=4 (8919 obs) right son=5 (4064 obs)
##   Primary splits:
##       Y7  < 1.025   to the right, improve=0.1554156, (0 missing)
##       Y23 < 594.465 to the left,  improve=0.1387768, (0 missing)
##       X7  < 989.66  to the left,  improve=0.1387669, (0 missing)
##       Y37 < 843.14  to the left,  improve=0.1363654, (0 missing)
##       Y43 < 987.355 to the left,  improve=0.1235102, (0 missing)
##   Surrogate splits:
##       Y5  < 0.005   to the right, agree=0.790, adj=0.330, (0 split)
##       Y21 < 252.415 to the right, agree=0.783, adj=0.308, (0 split)
##       X42 < 949.225 to the left,  agree=0.782, adj=0.304, (0 split)
##       Y13 < 108.185 to the right, agree=0.780, adj=0.299, (0 split)
##       Y12 < 113.885 to the right, agree=0.780, adj=0.298, (0 split)
## 
## Node number 3: 13796 observations,    complexity param=0.08152149
##   mean=4008702, MSE=1.266275e+10 
##   left son=6 (10871 obs) right son=7 (2925 obs)
##   Primary splits:
##       Y8  < 0.005   to the right, improve=0.1890390, (0 missing)
##       Y7  < 0.115   to the right, improve=0.1705424, (0 missing)
##       X42 < 346.155 to the right, improve=0.1552648, (0 missing)
##       X8  < 975     to the left,  improve=0.1485809, (0 missing)
##       X41 < 424.65  to the right, improve=0.1463478, (0 missing)
##   Surrogate splits:
##       Y7  < 17.325  to the right, agree=0.955, adj=0.786, (0 split)
##       X8  < 975     to the left,  agree=0.953, adj=0.780, (0 split)
##       X7  < 945.115 to the left,  agree=0.898, adj=0.519, (0 split)
##       Y6  < 36.57   to the right, agree=0.885, adj=0.459, (0 split)
##       Y46 < 874.895 to the left,  agree=0.861, adj=0.346, (0 split)
## 
## Node number 4: 8919 observations,    complexity param=0.01468595
##   mean=3839138, MSE=6.018791e+09 
##   left son=8 (8129 obs) right son=9 (790 obs)
##   Primary splits:
##       Y36 < 888.9   to the left,  improve=0.11082490, (0 missing)
##       Y37 < 849.52  to the left,  improve=0.09979984, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08753593, (0 missing)
##       X38 < 301.375 to the right, improve=0.08475904, (0 missing)
##       Y31 < 762.785 to the left,  improve=0.08455488, (0 missing)
##   Surrogate splits:
##       Y22 < 597.305 to the left,  agree=0.950, adj=0.432, (0 split)
##       Y41 < 732.145 to the right, agree=0.939, adj=0.306, (0 split)
##       Y35 < 502.59  to the right, agree=0.938, adj=0.300, (0 split)
##       Y14 < 3.185   to the right, agree=0.936, adj=0.280, (0 split)
##       Y28 < 228.085 to the right, agree=0.936, adj=0.272, (0 split)
## 
## Node number 5: 4064 observations,    complexity param=0.01519979
##   mean=3907291, MSE=4.131366e+09 
##   left son=10 (1113 obs) right son=11 (2951 obs)
##   Primary splits:
##       X7  < 989.04  to the left,  improve=0.3667337, (0 missing)
##       Y42 < 722.315 to the right, improve=0.2568592, (0 missing)
##       Y45 < 954.95  to the left,  improve=0.2413413, (0 missing)
##       Y49 < 924.325 to the right, improve=0.2015504, (0 missing)
##       Y37 < 880.775 to the left,  improve=0.2007039, (0 missing)
##   Surrogate splits:
##       X6  < 777.195 to the left,  agree=0.861, adj=0.494, (0 split)
##       X49 < 704.45  to the left,  agree=0.838, adj=0.407, (0 split)
##       X22 < 125     to the right, agree=0.824, adj=0.358, (0 split)
##       X8  < 625     to the right, agree=0.822, adj=0.351, (0 split)
##       Y15 < 175     to the left,  agree=0.822, adj=0.350, (0 split)
## 
## Node number 6: 10871 observations,    complexity param=0.04833354
##   mean=3983323, MSE=1.278124e+10 
##   left son=12 (4919 obs) right son=13 (5952 obs)
##   Primary splits:
##       X11 < 689.855 to the left,  improve=0.1409181, (0 missing)
##       X44 < 232.635 to the right, improve=0.1344166, (0 missing)
##       X42 < 346.165 to the right, improve=0.1323906, (0 missing)
##       X41 < 424.65  to the right, improve=0.1261570, (0 missing)
##       X47 < 489.785 to the right, improve=0.1257598, (0 missing)
##   Surrogate splits:
##       X12 < 632.375 to the left,  agree=0.946, adj=0.880, (0 split)
##       X10 < 746.095 to the left,  agree=0.913, adj=0.807, (0 split)
##       X9  < 775     to the left,  agree=0.845, adj=0.658, (0 split)
##       Y12 < 153.435 to the right, agree=0.789, adj=0.533, (0 split)
##       Y10 < 150.61  to the right, agree=0.786, adj=0.528, (0 split)
## 
## Node number 7: 2925 observations
##   mean=4103024, MSE=9.320662e+08 
## 
## Node number 8: 8129 observations,    complexity param=0.01354492
##   mean=3831086, MSE=5.558648e+09 
##   left son=16 (6793 obs) right son=17 (1336 obs)
##   Primary splits:
##       Y35 < 765.05  to the left,  improve=0.12143140, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08900714, (0 missing)
##       X47 < 25      to the right, improve=0.08094458, (0 missing)
##       X5  < 970     to the left,  improve=0.08011203, (0 missing)
##       Y41 < 884.295 to the left,  improve=0.07540451, (0 missing)
##   Surrogate splits:
##       Y34 < 766.615 to the left,  agree=0.910, adj=0.455, (0 split)
##       Y33 < 775     to the left,  agree=0.877, adj=0.249, (0 split)
##       Y42 < 925     to the left,  agree=0.875, adj=0.238, (0 split)
##       X30 < 8.765   to the right, agree=0.863, adj=0.167, (0 split)
##       Y41 < 927.465 to the left,  agree=0.860, adj=0.147, (0 split)
## 
## Node number 9: 790 observations
##   mean=3921985, MSE=3.222901e+09 
## 
## Node number 10: 1113 observations
##   mean=3843910, MSE=2.99382e+09 
## 
## Node number 11: 2951 observations
##   mean=3931195, MSE=2.473852e+09 
## 
## Node number 12: 4919 observations,    complexity param=0.04293438
##   mean=3936640, MSE=1.519331e+10 
##   left son=24 (2716 obs) right son=25 (2203 obs)
##   Primary splits:
##       X6  < 848.785 to the left,  improve=0.2327215, (0 missing)
##       X5  < 995     to the left,  improve=0.2205077, (0 missing)
##       X8  < 816.01  to the left,  improve=0.2144716, (0 missing)
##       X1  < 575     to the left,  improve=0.1638708, (0 missing)
##       X39 < 325     to the right, improve=0.1629009, (0 missing)
##   Surrogate splits:
##       X5 < 942.285 to the left,  agree=0.981, adj=0.957, (0 split)
##       X8 < 816.01  to the left,  agree=0.979, adj=0.954, (0 split)
##       X7 < 875.59  to the left,  agree=0.953, adj=0.895, (0 split)
##       X1 < 575     to the left,  agree=0.770, adj=0.488, (0 split)
##       X2 < 375     to the left,  agree=0.768, adj=0.481, (0 split)
## 
## Node number 13: 5952 observations,    complexity param=0.02321593
##   mean=4021905, MSE=7.498166e+09 
##   left son=26 (1958 obs) right son=27 (3994 obs)
##   Primary splits:
##       X44 < 232.48  to the right, improve=0.2107311, (0 missing)
##       X43 < 289.965 to the right, improve=0.2058388, (0 missing)
##       X42 < 346.155 to the right, improve=0.1741573, (0 missing)
##       X47 < 689.725 to the right, improve=0.1671233, (0 missing)
##       X46 < 746.075 to the right, improve=0.1531187, (0 missing)
##   Surrogate splits:
##       X43 < 289.945 to the right, agree=0.968, adj=0.904, (0 split)
##       X42 < 346.155 to the right, agree=0.882, adj=0.640, (0 split)
##       X41 < 425     to the right, agree=0.865, adj=0.589, (0 split)
##       Y43 < 821     to the left,  agree=0.769, adj=0.298, (0 split)
##       Y44 < 825     to the left,  agree=0.769, adj=0.297, (0 split)
## 
## Node number 16: 6793 observations
##   mean=3819564, MSE=4.723845e+09 
## 
## Node number 17: 1336 observations
##   mean=3889670, MSE=5.696214e+09 
## 
## Node number 24: 2716 observations
##   mean=3883087, MSE=1.584461e+10 
## 
## Node number 25: 2203 observations
##   mean=4002664, MSE=6.495363e+09 
## 
## Node number 26: 1958 observations
##   mean=3965132, MSE=1.048974e+10 
## 
## Node number 27: 3994 observations
##   mean=4049737, MSE=3.676873e+09
y <- predict(tree1,newdata=dataTE)
sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power))
## [1] 892921.9
mean(dataTE$Total_Power)
## [1] 3942322
tree1=rpart(dataTR$Total_Power~.,method="anova",data=dataTR,maxdepth=5,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Total_Power ~ ., data = dataTR, method = "anova", 
##     maxdepth = 5, xval = 10)
##   n= 26779 
## 
##            CP nsplit rel error    xerror        xstd
## 1  0.36278557      0 1.0000000 1.0000807 0.007211639
## 2  0.08152149      1 0.6372144 0.6373043 0.007604259
## 3  0.04833354      2 0.5556929 0.5558509 0.006623312
## 4  0.04293438      3 0.5073594 0.5075591 0.005957143
## 5  0.03201137      4 0.4644250 0.4646950 0.005547349
## 6  0.02518016      5 0.4324136 0.4328055 0.005368977
## 7  0.02321593      6 0.4072335 0.4126473 0.005208658
## 8  0.01519979      7 0.3840176 0.3861652 0.004729450
## 9  0.01468595      8 0.3688178 0.3736408 0.004709602
## 10 0.01354492      9 0.3541318 0.3562581 0.004617328
## 11 0.01148675     10 0.3405869 0.3421944 0.004592054
## 12 0.01000000     11 0.3291001 0.3316471 0.004474801
## 
## Variable importance
## Y46 Y45  Y4  Y3 X49 Y47  X8  Y7  X7  Y8  X6 X11 Y10 X12  X5 X10  Y6 Y12 Y13  X9 
##  13  11  10  10   9   9   4   3   3   3   2   2   1   1   1   1   1   1   1   1 
## Y15 Y14 Y16 X42 X44 X43  X1  X2 Y35 Y11 
##   1   1   1   1   1   1   1   1   1   1 
## 
## Node number 1: 26779 observations,    complexity param=0.3627856
##   mean=3936837, MSE=1.512747e+10 
##   left son=2 (12983 obs) right son=3 (13796 obs)
##   Primary splits:
##       Y46 < 875.95  to the right, improve=0.3627856, (0 missing)
##       Y45 < 843.24  to the right, improve=0.3548318, (0 missing)
##       Y4  < 101.5   to the left,  improve=0.3292959, (0 missing)
##       Y3  < 74.275  to the left,  improve=0.3183650, (0 missing)
##       Y47 < 912.745 to the right, improve=0.2426537, (0 missing)
##   Surrogate splits:
##       Y45 < 843.24  to the right, agree=0.966, adj=0.929, (0 split)
##       Y4  < 101.5   to the left,  agree=0.952, adj=0.900, (0 split)
##       Y3  < 74.275  to the left,  agree=0.917, adj=0.828, (0 split)
##       Y47 < 894.03  to the right, agree=0.906, adj=0.805, (0 split)
##       X49 < 42.165  to the right, agree=0.903, adj=0.800, (0 split)
## 
## Node number 2: 12983 observations,    complexity param=0.03201137
##   mean=3860471, MSE=6.426806e+09 
##   left son=4 (8919 obs) right son=5 (4064 obs)
##   Primary splits:
##       Y7  < 1.025   to the right, improve=0.1554156, (0 missing)
##       Y23 < 594.465 to the left,  improve=0.1387768, (0 missing)
##       X7  < 989.66  to the left,  improve=0.1387669, (0 missing)
##       Y37 < 843.14  to the left,  improve=0.1363654, (0 missing)
##       Y43 < 987.355 to the left,  improve=0.1235102, (0 missing)
##   Surrogate splits:
##       Y5  < 0.005   to the right, agree=0.790, adj=0.330, (0 split)
##       Y21 < 252.415 to the right, agree=0.783, adj=0.308, (0 split)
##       X42 < 949.225 to the left,  agree=0.782, adj=0.304, (0 split)
##       Y13 < 108.185 to the right, agree=0.780, adj=0.299, (0 split)
##       Y12 < 113.885 to the right, agree=0.780, adj=0.298, (0 split)
## 
## Node number 3: 13796 observations,    complexity param=0.08152149
##   mean=4008702, MSE=1.266275e+10 
##   left son=6 (10871 obs) right son=7 (2925 obs)
##   Primary splits:
##       Y8  < 0.005   to the right, improve=0.1890390, (0 missing)
##       Y7  < 0.115   to the right, improve=0.1705424, (0 missing)
##       X42 < 346.155 to the right, improve=0.1552648, (0 missing)
##       X8  < 975     to the left,  improve=0.1485809, (0 missing)
##       X41 < 424.65  to the right, improve=0.1463478, (0 missing)
##   Surrogate splits:
##       Y7  < 17.325  to the right, agree=0.955, adj=0.786, (0 split)
##       X8  < 975     to the left,  agree=0.953, adj=0.780, (0 split)
##       X7  < 945.115 to the left,  agree=0.898, adj=0.519, (0 split)
##       Y6  < 36.57   to the right, agree=0.885, adj=0.459, (0 split)
##       Y46 < 874.895 to the left,  agree=0.861, adj=0.346, (0 split)
## 
## Node number 4: 8919 observations,    complexity param=0.01468595
##   mean=3839138, MSE=6.018791e+09 
##   left son=8 (8129 obs) right son=9 (790 obs)
##   Primary splits:
##       Y36 < 888.9   to the left,  improve=0.11082490, (0 missing)
##       Y37 < 849.52  to the left,  improve=0.09979984, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08753593, (0 missing)
##       X38 < 301.375 to the right, improve=0.08475904, (0 missing)
##       Y31 < 762.785 to the left,  improve=0.08455488, (0 missing)
##   Surrogate splits:
##       Y22 < 597.305 to the left,  agree=0.950, adj=0.432, (0 split)
##       Y41 < 732.145 to the right, agree=0.939, adj=0.306, (0 split)
##       Y35 < 502.59  to the right, agree=0.938, adj=0.300, (0 split)
##       Y14 < 3.185   to the right, agree=0.936, adj=0.280, (0 split)
##       Y28 < 228.085 to the right, agree=0.936, adj=0.272, (0 split)
## 
## Node number 5: 4064 observations,    complexity param=0.01519979
##   mean=3907291, MSE=4.131366e+09 
##   left son=10 (1113 obs) right son=11 (2951 obs)
##   Primary splits:
##       X7  < 989.04  to the left,  improve=0.3667337, (0 missing)
##       Y42 < 722.315 to the right, improve=0.2568592, (0 missing)
##       Y45 < 954.95  to the left,  improve=0.2413413, (0 missing)
##       Y49 < 924.325 to the right, improve=0.2015504, (0 missing)
##       Y37 < 880.775 to the left,  improve=0.2007039, (0 missing)
##   Surrogate splits:
##       X6  < 777.195 to the left,  agree=0.861, adj=0.494, (0 split)
##       X49 < 704.45  to the left,  agree=0.838, adj=0.407, (0 split)
##       X22 < 125     to the right, agree=0.824, adj=0.358, (0 split)
##       X8  < 625     to the right, agree=0.822, adj=0.351, (0 split)
##       Y15 < 175     to the left,  agree=0.822, adj=0.350, (0 split)
## 
## Node number 6: 10871 observations,    complexity param=0.04833354
##   mean=3983323, MSE=1.278124e+10 
##   left son=12 (4919 obs) right son=13 (5952 obs)
##   Primary splits:
##       X11 < 689.855 to the left,  improve=0.1409181, (0 missing)
##       X44 < 232.635 to the right, improve=0.1344166, (0 missing)
##       X42 < 346.165 to the right, improve=0.1323906, (0 missing)
##       X41 < 424.65  to the right, improve=0.1261570, (0 missing)
##       X47 < 489.785 to the right, improve=0.1257598, (0 missing)
##   Surrogate splits:
##       X12 < 632.375 to the left,  agree=0.946, adj=0.880, (0 split)
##       X10 < 746.095 to the left,  agree=0.913, adj=0.807, (0 split)
##       X9  < 775     to the left,  agree=0.845, adj=0.658, (0 split)
##       Y12 < 153.435 to the right, agree=0.789, adj=0.533, (0 split)
##       Y10 < 150.61  to the right, agree=0.786, adj=0.528, (0 split)
## 
## Node number 7: 2925 observations
##   mean=4103024, MSE=9.320662e+08 
## 
## Node number 8: 8129 observations,    complexity param=0.01354492
##   mean=3831086, MSE=5.558648e+09 
##   left son=16 (6793 obs) right son=17 (1336 obs)
##   Primary splits:
##       Y35 < 765.05  to the left,  improve=0.12143140, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08900714, (0 missing)
##       X47 < 25      to the right, improve=0.08094458, (0 missing)
##       X5  < 970     to the left,  improve=0.08011203, (0 missing)
##       Y41 < 884.295 to the left,  improve=0.07540451, (0 missing)
##   Surrogate splits:
##       Y34 < 766.615 to the left,  agree=0.910, adj=0.455, (0 split)
##       Y33 < 775     to the left,  agree=0.877, adj=0.249, (0 split)
##       Y42 < 925     to the left,  agree=0.875, adj=0.238, (0 split)
##       X30 < 8.765   to the right, agree=0.863, adj=0.167, (0 split)
##       Y41 < 927.465 to the left,  agree=0.860, adj=0.147, (0 split)
## 
## Node number 9: 790 observations
##   mean=3921985, MSE=3.222901e+09 
## 
## Node number 10: 1113 observations
##   mean=3843910, MSE=2.99382e+09 
## 
## Node number 11: 2951 observations
##   mean=3931195, MSE=2.473852e+09 
## 
## Node number 12: 4919 observations,    complexity param=0.04293438
##   mean=3936640, MSE=1.519331e+10 
##   left son=24 (2716 obs) right son=25 (2203 obs)
##   Primary splits:
##       X6  < 848.785 to the left,  improve=0.2327215, (0 missing)
##       X5  < 995     to the left,  improve=0.2205077, (0 missing)
##       X8  < 816.01  to the left,  improve=0.2144716, (0 missing)
##       X1  < 575     to the left,  improve=0.1638708, (0 missing)
##       X39 < 325     to the right, improve=0.1629009, (0 missing)
##   Surrogate splits:
##       X5 < 942.285 to the left,  agree=0.981, adj=0.957, (0 split)
##       X8 < 816.01  to the left,  agree=0.979, adj=0.954, (0 split)
##       X7 < 875.59  to the left,  agree=0.953, adj=0.895, (0 split)
##       X1 < 575     to the left,  agree=0.770, adj=0.488, (0 split)
##       X2 < 375     to the left,  agree=0.768, adj=0.481, (0 split)
## 
## Node number 13: 5952 observations,    complexity param=0.02321593
##   mean=4021905, MSE=7.498166e+09 
##   left son=26 (1958 obs) right son=27 (3994 obs)
##   Primary splits:
##       X44 < 232.48  to the right, improve=0.2107311, (0 missing)
##       X43 < 289.965 to the right, improve=0.2058388, (0 missing)
##       X42 < 346.155 to the right, improve=0.1741573, (0 missing)
##       X47 < 689.725 to the right, improve=0.1671233, (0 missing)
##       X46 < 746.075 to the right, improve=0.1531187, (0 missing)
##   Surrogate splits:
##       X43 < 289.945 to the right, agree=0.968, adj=0.904, (0 split)
##       X42 < 346.155 to the right, agree=0.882, adj=0.640, (0 split)
##       X41 < 425     to the right, agree=0.865, adj=0.589, (0 split)
##       Y43 < 821     to the left,  agree=0.769, adj=0.298, (0 split)
##       Y44 < 825     to the left,  agree=0.769, adj=0.297, (0 split)
## 
## Node number 16: 6793 observations
##   mean=3819564, MSE=4.723845e+09 
## 
## Node number 17: 1336 observations
##   mean=3889670, MSE=5.696214e+09 
## 
## Node number 24: 2716 observations,    complexity param=0.02518016
##   mean=3883087, MSE=1.584461e+10 
##   left son=48 (1857 obs) right son=49 (859 obs)
##   Primary splits:
##       Y14 < 37.785  to the right, improve=0.2370324, (0 missing)
##       X15 < 689.855 to the left,  improve=0.2369716, (0 missing)
##       Y16 < 112.665 to the right, improve=0.2362359, (0 missing)
##       Y15 < 75.22   to the right, improve=0.2356853, (0 missing)
##       Y13 < 59.77   to the right, improve=0.2325060, (0 missing)
##   Surrogate splits:
##       Y15 < 75.22   to the right, agree=0.999, adj=0.997, (0 split)
##       Y16 < 112.665 to the right, agree=0.999, adj=0.995, (0 split)
##       Y13 < 31.035  to the right, agree=0.992, adj=0.973, (0 split)
##       Y10 < 37.785  to the right, agree=0.912, adj=0.721, (0 split)
##       Y11 < 75.22   to the right, agree=0.900, adj=0.685, (0 split)
## 
## Node number 25: 2203 observations
##   mean=4002664, MSE=6.495363e+09 
## 
## Node number 26: 1958 observations,    complexity param=0.01148675
##   mean=3965132, MSE=1.048974e+10 
##   left son=52 (1468 obs) right son=53 (490 obs)
##   Primary splits:
##       X48 < 32.6    to the right, improve=0.2265584, (0 missing)
##       X46 < 146.145 to the right, improve=0.2230136, (0 missing)
##       X47 < 89.965  to the right, improve=0.2216917, (0 missing)
##       X45 < 300     to the right, improve=0.2161759, (0 missing)
##       X38 < 546.065 to the right, improve=0.1853625, (0 missing)
##   Surrogate splits:
##       X46 < 146.145 to the right, agree=0.993, adj=0.973, (0 split)
##       X47 < 89.965  to the right, agree=0.993, adj=0.973, (0 split)
##       X45 < 300     to the right, agree=0.982, adj=0.929, (0 split)
##       Y44 < 712.155 to the right, agree=0.919, adj=0.678, (0 split)
##       Y43 < 674.97  to the right, agree=0.907, adj=0.627, (0 split)
## 
## Node number 27: 3994 observations
##   mean=4049737, MSE=3.676873e+09 
## 
## Node number 48: 1857 observations
##   mean=3841406, MSE=1.27667e+10 
## 
## Node number 49: 859 observations
##   mean=3973193, MSE=1.062369e+10 
## 
## Node number 52: 1468 observations
##   mean=3936967, MSE=1.019573e+10 
## 
## Node number 53: 490 observations
##   mean=4049512, MSE=1.874112e+09
y <- predict(tree1,newdata=dataTE)

sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power))
## [1] 736142.8
mean(dataTE$Total_Power)
## [1] 3942322
tree1=rpart(dataTR$Total_Power~.,method="anova",data=dataTR,maxdepth=6,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Total_Power ~ ., data = dataTR, method = "anova", 
##     maxdepth = 6, xval = 10)
##   n= 26779 
## 
##            CP nsplit rel error    xerror        xstd
## 1  0.36278557      0 1.0000000 1.0001406 0.007211747
## 2  0.08152149      1 0.6372144 0.6373313 0.007603892
## 3  0.04833354      2 0.5556929 0.5558731 0.006622679
## 4  0.04293438      3 0.5073594 0.5076217 0.005956910
## 5  0.03201137      4 0.4644250 0.4647452 0.005547817
## 6  0.02518016      5 0.4324136 0.4327468 0.005368358
## 7  0.02321593      6 0.4072335 0.4096175 0.005167989
## 8  0.01691646      7 0.3840176 0.3867312 0.004736046
## 9  0.01519979      8 0.3671011 0.3767329 0.004595393
## 10 0.01468595      9 0.3519013 0.3620376 0.004552858
## 11 0.01354492     10 0.3372154 0.3503723 0.004499248
## 12 0.01148675     11 0.3236704 0.3286749 0.004396416
## 13 0.01000000     12 0.3121837 0.3218257 0.004322894
## 
## Variable importance
## Y46 Y45  Y4  Y3 X49 Y47  X8  Y7  X7  Y8  X6 X11 Y10 X12  X5 X10  X1  Y6  X2 Y12 
##  12  11  10   9   9   9   3   3   3   3   2   2   1   1   1   1   1   1   1   1 
## Y13  X9 Y15 Y14 Y16 X42 X44 X43 Y35 Y11  X3 
##   1   1   1   1   1   1   1   1   1   1   1 
## 
## Node number 1: 26779 observations,    complexity param=0.3627856
##   mean=3936837, MSE=1.512747e+10 
##   left son=2 (12983 obs) right son=3 (13796 obs)
##   Primary splits:
##       Y46 < 875.95  to the right, improve=0.3627856, (0 missing)
##       Y45 < 843.24  to the right, improve=0.3548318, (0 missing)
##       Y4  < 101.5   to the left,  improve=0.3292959, (0 missing)
##       Y3  < 74.275  to the left,  improve=0.3183650, (0 missing)
##       Y47 < 912.745 to the right, improve=0.2426537, (0 missing)
##   Surrogate splits:
##       Y45 < 843.24  to the right, agree=0.966, adj=0.929, (0 split)
##       Y4  < 101.5   to the left,  agree=0.952, adj=0.900, (0 split)
##       Y3  < 74.275  to the left,  agree=0.917, adj=0.828, (0 split)
##       Y47 < 894.03  to the right, agree=0.906, adj=0.805, (0 split)
##       X49 < 42.165  to the right, agree=0.903, adj=0.800, (0 split)
## 
## Node number 2: 12983 observations,    complexity param=0.03201137
##   mean=3860471, MSE=6.426806e+09 
##   left son=4 (8919 obs) right son=5 (4064 obs)
##   Primary splits:
##       Y7  < 1.025   to the right, improve=0.1554156, (0 missing)
##       Y23 < 594.465 to the left,  improve=0.1387768, (0 missing)
##       X7  < 989.66  to the left,  improve=0.1387669, (0 missing)
##       Y37 < 843.14  to the left,  improve=0.1363654, (0 missing)
##       Y43 < 987.355 to the left,  improve=0.1235102, (0 missing)
##   Surrogate splits:
##       Y5  < 0.005   to the right, agree=0.790, adj=0.330, (0 split)
##       Y21 < 252.415 to the right, agree=0.783, adj=0.308, (0 split)
##       X42 < 949.225 to the left,  agree=0.782, adj=0.304, (0 split)
##       Y13 < 108.185 to the right, agree=0.780, adj=0.299, (0 split)
##       Y12 < 113.885 to the right, agree=0.780, adj=0.298, (0 split)
## 
## Node number 3: 13796 observations,    complexity param=0.08152149
##   mean=4008702, MSE=1.266275e+10 
##   left son=6 (10871 obs) right son=7 (2925 obs)
##   Primary splits:
##       Y8  < 0.005   to the right, improve=0.1890390, (0 missing)
##       Y7  < 0.115   to the right, improve=0.1705424, (0 missing)
##       X42 < 346.155 to the right, improve=0.1552648, (0 missing)
##       X8  < 975     to the left,  improve=0.1485809, (0 missing)
##       X41 < 424.65  to the right, improve=0.1463478, (0 missing)
##   Surrogate splits:
##       Y7  < 17.325  to the right, agree=0.955, adj=0.786, (0 split)
##       X8  < 975     to the left,  agree=0.953, adj=0.780, (0 split)
##       X7  < 945.115 to the left,  agree=0.898, adj=0.519, (0 split)
##       Y6  < 36.57   to the right, agree=0.885, adj=0.459, (0 split)
##       Y46 < 874.895 to the left,  agree=0.861, adj=0.346, (0 split)
## 
## Node number 4: 8919 observations,    complexity param=0.01468595
##   mean=3839138, MSE=6.018791e+09 
##   left son=8 (8129 obs) right son=9 (790 obs)
##   Primary splits:
##       Y36 < 888.9   to the left,  improve=0.11082490, (0 missing)
##       Y37 < 849.52  to the left,  improve=0.09979984, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08753593, (0 missing)
##       X38 < 301.375 to the right, improve=0.08475904, (0 missing)
##       Y31 < 762.785 to the left,  improve=0.08455488, (0 missing)
##   Surrogate splits:
##       Y22 < 597.305 to the left,  agree=0.950, adj=0.432, (0 split)
##       Y41 < 732.145 to the right, agree=0.939, adj=0.306, (0 split)
##       Y35 < 502.59  to the right, agree=0.938, adj=0.300, (0 split)
##       Y14 < 3.185   to the right, agree=0.936, adj=0.280, (0 split)
##       Y28 < 228.085 to the right, agree=0.936, adj=0.272, (0 split)
## 
## Node number 5: 4064 observations,    complexity param=0.01519979
##   mean=3907291, MSE=4.131366e+09 
##   left son=10 (1113 obs) right son=11 (2951 obs)
##   Primary splits:
##       X7  < 989.04  to the left,  improve=0.3667337, (0 missing)
##       Y42 < 722.315 to the right, improve=0.2568592, (0 missing)
##       Y45 < 954.95  to the left,  improve=0.2413413, (0 missing)
##       Y49 < 924.325 to the right, improve=0.2015504, (0 missing)
##       Y37 < 880.775 to the left,  improve=0.2007039, (0 missing)
##   Surrogate splits:
##       X6  < 777.195 to the left,  agree=0.861, adj=0.494, (0 split)
##       X49 < 704.45  to the left,  agree=0.838, adj=0.407, (0 split)
##       X22 < 125     to the right, agree=0.824, adj=0.358, (0 split)
##       X8  < 625     to the right, agree=0.822, adj=0.351, (0 split)
##       Y15 < 175     to the left,  agree=0.822, adj=0.350, (0 split)
## 
## Node number 6: 10871 observations,    complexity param=0.04833354
##   mean=3983323, MSE=1.278124e+10 
##   left son=12 (4919 obs) right son=13 (5952 obs)
##   Primary splits:
##       X11 < 689.855 to the left,  improve=0.1409181, (0 missing)
##       X44 < 232.635 to the right, improve=0.1344166, (0 missing)
##       X42 < 346.165 to the right, improve=0.1323906, (0 missing)
##       X41 < 424.65  to the right, improve=0.1261570, (0 missing)
##       X47 < 489.785 to the right, improve=0.1257598, (0 missing)
##   Surrogate splits:
##       X12 < 632.375 to the left,  agree=0.946, adj=0.880, (0 split)
##       X10 < 746.095 to the left,  agree=0.913, adj=0.807, (0 split)
##       X9  < 775     to the left,  agree=0.845, adj=0.658, (0 split)
##       Y12 < 153.435 to the right, agree=0.789, adj=0.533, (0 split)
##       Y10 < 150.61  to the right, agree=0.786, adj=0.528, (0 split)
## 
## Node number 7: 2925 observations
##   mean=4103024, MSE=9.320662e+08 
## 
## Node number 8: 8129 observations,    complexity param=0.01354492
##   mean=3831086, MSE=5.558648e+09 
##   left son=16 (6793 obs) right son=17 (1336 obs)
##   Primary splits:
##       Y35 < 765.05  to the left,  improve=0.12143140, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08900714, (0 missing)
##       X47 < 25      to the right, improve=0.08094458, (0 missing)
##       X5  < 970     to the left,  improve=0.08011203, (0 missing)
##       Y41 < 884.295 to the left,  improve=0.07540451, (0 missing)
##   Surrogate splits:
##       Y34 < 766.615 to the left,  agree=0.910, adj=0.455, (0 split)
##       Y33 < 775     to the left,  agree=0.877, adj=0.249, (0 split)
##       Y42 < 925     to the left,  agree=0.875, adj=0.238, (0 split)
##       X30 < 8.765   to the right, agree=0.863, adj=0.167, (0 split)
##       Y41 < 927.465 to the left,  agree=0.860, adj=0.147, (0 split)
## 
## Node number 9: 790 observations
##   mean=3921985, MSE=3.222901e+09 
## 
## Node number 10: 1113 observations
##   mean=3843910, MSE=2.99382e+09 
## 
## Node number 11: 2951 observations
##   mean=3931195, MSE=2.473852e+09 
## 
## Node number 12: 4919 observations,    complexity param=0.04293438
##   mean=3936640, MSE=1.519331e+10 
##   left son=24 (2716 obs) right son=25 (2203 obs)
##   Primary splits:
##       X6  < 848.785 to the left,  improve=0.2327215, (0 missing)
##       X5  < 995     to the left,  improve=0.2205077, (0 missing)
##       X8  < 816.01  to the left,  improve=0.2144716, (0 missing)
##       X1  < 575     to the left,  improve=0.1638708, (0 missing)
##       X39 < 325     to the right, improve=0.1629009, (0 missing)
##   Surrogate splits:
##       X5 < 942.285 to the left,  agree=0.981, adj=0.957, (0 split)
##       X8 < 816.01  to the left,  agree=0.979, adj=0.954, (0 split)
##       X7 < 875.59  to the left,  agree=0.953, adj=0.895, (0 split)
##       X1 < 575     to the left,  agree=0.770, adj=0.488, (0 split)
##       X2 < 375     to the left,  agree=0.768, adj=0.481, (0 split)
## 
## Node number 13: 5952 observations,    complexity param=0.02321593
##   mean=4021905, MSE=7.498166e+09 
##   left son=26 (1958 obs) right son=27 (3994 obs)
##   Primary splits:
##       X44 < 232.48  to the right, improve=0.2107311, (0 missing)
##       X43 < 289.965 to the right, improve=0.2058388, (0 missing)
##       X42 < 346.155 to the right, improve=0.1741573, (0 missing)
##       X47 < 689.725 to the right, improve=0.1671233, (0 missing)
##       X46 < 746.075 to the right, improve=0.1531187, (0 missing)
##   Surrogate splits:
##       X43 < 289.945 to the right, agree=0.968, adj=0.904, (0 split)
##       X42 < 346.155 to the right, agree=0.882, adj=0.640, (0 split)
##       X41 < 425     to the right, agree=0.865, adj=0.589, (0 split)
##       Y43 < 821     to the left,  agree=0.769, adj=0.298, (0 split)
##       Y44 < 825     to the left,  agree=0.769, adj=0.297, (0 split)
## 
## Node number 16: 6793 observations
##   mean=3819564, MSE=4.723845e+09 
## 
## Node number 17: 1336 observations
##   mean=3889670, MSE=5.696214e+09 
## 
## Node number 24: 2716 observations,    complexity param=0.02518016
##   mean=3883087, MSE=1.584461e+10 
##   left son=48 (1857 obs) right son=49 (859 obs)
##   Primary splits:
##       Y14 < 37.785  to the right, improve=0.2370324, (0 missing)
##       X15 < 689.855 to the left,  improve=0.2369716, (0 missing)
##       Y16 < 112.665 to the right, improve=0.2362359, (0 missing)
##       Y15 < 75.22   to the right, improve=0.2356853, (0 missing)
##       Y13 < 59.77   to the right, improve=0.2325060, (0 missing)
##   Surrogate splits:
##       Y15 < 75.22   to the right, agree=0.999, adj=0.997, (0 split)
##       Y16 < 112.665 to the right, agree=0.999, adj=0.995, (0 split)
##       Y13 < 31.035  to the right, agree=0.992, adj=0.973, (0 split)
##       Y10 < 37.785  to the right, agree=0.912, adj=0.721, (0 split)
##       Y11 < 75.22   to the right, agree=0.900, adj=0.685, (0 split)
## 
## Node number 25: 2203 observations
##   mean=4002664, MSE=6.495363e+09 
## 
## Node number 26: 1958 observations,    complexity param=0.01148675
##   mean=3965132, MSE=1.048974e+10 
##   left son=52 (1468 obs) right son=53 (490 obs)
##   Primary splits:
##       X48 < 32.6    to the right, improve=0.2265584, (0 missing)
##       X46 < 146.145 to the right, improve=0.2230136, (0 missing)
##       X47 < 89.965  to the right, improve=0.2216917, (0 missing)
##       X45 < 300     to the right, improve=0.2161759, (0 missing)
##       X38 < 546.065 to the right, improve=0.1853625, (0 missing)
##   Surrogate splits:
##       X46 < 146.145 to the right, agree=0.993, adj=0.973, (0 split)
##       X47 < 89.965  to the right, agree=0.993, adj=0.973, (0 split)
##       X45 < 300     to the right, agree=0.982, adj=0.929, (0 split)
##       Y44 < 712.155 to the right, agree=0.919, adj=0.678, (0 split)
##       Y43 < 674.97  to the right, agree=0.907, adj=0.627, (0 split)
## 
## Node number 27: 3994 observations
##   mean=4049737, MSE=3.676873e+09 
## 
## Node number 48: 1857 observations,    complexity param=0.01691646
##   mean=3841406, MSE=1.27667e+10 
##   left son=96 (1579 obs) right son=97 (278 obs)
##   Primary splits:
##       X1  < 900     to the left,  improve=0.2890544, (0 missing)
##       X2  < 922.925 to the left,  improve=0.2781630, (0 missing)
##       X3  < 869.51  to the left,  improve=0.2775724, (0 missing)
##       X4  < 826.82  to the left,  improve=0.2505744, (0 missing)
##       X42 < 346.165 to the right, improve=0.1243527, (0 missing)
##   Surrogate splits:
##       X2  < 922.925 to the left,  agree=0.997, adj=0.982, (0 split)
##       X3  < 869.51  to the left,  agree=0.997, adj=0.978, (0 split)
##       X4  < 826.82  to the left,  agree=0.987, adj=0.910, (0 split)
##       X10 < 746.095 to the left,  agree=0.858, adj=0.050, (0 split)
##       X26 < 145.88  to the right, agree=0.854, adj=0.022, (0 split)
## 
## Node number 49: 859 observations
##   mean=3973193, MSE=1.062369e+10 
## 
## Node number 52: 1468 observations
##   mean=3936967, MSE=1.019573e+10 
## 
## Node number 53: 490 observations
##   mean=4049512, MSE=1.874112e+09 
## 
## Node number 96: 1579 observations
##   mean=3815916, MSE=9.502688e+09 
## 
## Node number 97: 278 observations
##   mean=3986182, MSE=6.655327e+09
y <- predict(tree1,newdata=dataTE)

sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power))
## [1] 723697.8
mean(dataTE$Total_Power)
## [1] 3942322
tree1=rpart(dataTR$Total_Power~.,method="anova",data=dataTR,maxdepth=7,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Total_Power ~ ., data = dataTR, method = "anova", 
##     maxdepth = 7, xval = 10)
##   n= 26779 
## 
##            CP nsplit rel error    xerror        xstd
## 1  0.36278557      0 1.0000000 1.0000816 0.007210889
## 2  0.08152149      1 0.6372144 0.6372940 0.007603432
## 3  0.04833354      2 0.5556929 0.5558427 0.006622318
## 4  0.04293438      3 0.5073594 0.5075582 0.005956005
## 5  0.03201137      4 0.4644250 0.4647886 0.005547386
## 6  0.02518016      5 0.4324136 0.4328664 0.005368838
## 7  0.02321593      6 0.4072335 0.4094955 0.005168820
## 8  0.01691646      7 0.3840176 0.3866647 0.004738705
## 9  0.01519979      8 0.3671011 0.3796586 0.004648296
## 10 0.01468595      9 0.3519013 0.3610730 0.004596265
## 11 0.01354492     10 0.3372154 0.3538380 0.004554872
## 12 0.01148675     11 0.3236704 0.3342134 0.004483673
## 13 0.01000000     12 0.3121837 0.3245792 0.004367599
## 
## Variable importance
## Y46 Y45  Y4  Y3 X49 Y47  X8  Y7  X7  Y8  X6 X11 Y10 X12  X5 X10  X1  Y6  X2 Y12 
##  12  11  10   9   9   9   3   3   3   3   2   2   1   1   1   1   1   1   1   1 
## Y13  X9 Y15 Y14 Y16 X42 X44 X43 Y35 Y11  X3 
##   1   1   1   1   1   1   1   1   1   1   1 
## 
## Node number 1: 26779 observations,    complexity param=0.3627856
##   mean=3936837, MSE=1.512747e+10 
##   left son=2 (12983 obs) right son=3 (13796 obs)
##   Primary splits:
##       Y46 < 875.95  to the right, improve=0.3627856, (0 missing)
##       Y45 < 843.24  to the right, improve=0.3548318, (0 missing)
##       Y4  < 101.5   to the left,  improve=0.3292959, (0 missing)
##       Y3  < 74.275  to the left,  improve=0.3183650, (0 missing)
##       Y47 < 912.745 to the right, improve=0.2426537, (0 missing)
##   Surrogate splits:
##       Y45 < 843.24  to the right, agree=0.966, adj=0.929, (0 split)
##       Y4  < 101.5   to the left,  agree=0.952, adj=0.900, (0 split)
##       Y3  < 74.275  to the left,  agree=0.917, adj=0.828, (0 split)
##       Y47 < 894.03  to the right, agree=0.906, adj=0.805, (0 split)
##       X49 < 42.165  to the right, agree=0.903, adj=0.800, (0 split)
## 
## Node number 2: 12983 observations,    complexity param=0.03201137
##   mean=3860471, MSE=6.426806e+09 
##   left son=4 (8919 obs) right son=5 (4064 obs)
##   Primary splits:
##       Y7  < 1.025   to the right, improve=0.1554156, (0 missing)
##       Y23 < 594.465 to the left,  improve=0.1387768, (0 missing)
##       X7  < 989.66  to the left,  improve=0.1387669, (0 missing)
##       Y37 < 843.14  to the left,  improve=0.1363654, (0 missing)
##       Y43 < 987.355 to the left,  improve=0.1235102, (0 missing)
##   Surrogate splits:
##       Y5  < 0.005   to the right, agree=0.790, adj=0.330, (0 split)
##       Y21 < 252.415 to the right, agree=0.783, adj=0.308, (0 split)
##       X42 < 949.225 to the left,  agree=0.782, adj=0.304, (0 split)
##       Y13 < 108.185 to the right, agree=0.780, adj=0.299, (0 split)
##       Y12 < 113.885 to the right, agree=0.780, adj=0.298, (0 split)
## 
## Node number 3: 13796 observations,    complexity param=0.08152149
##   mean=4008702, MSE=1.266275e+10 
##   left son=6 (10871 obs) right son=7 (2925 obs)
##   Primary splits:
##       Y8  < 0.005   to the right, improve=0.1890390, (0 missing)
##       Y7  < 0.115   to the right, improve=0.1705424, (0 missing)
##       X42 < 346.155 to the right, improve=0.1552648, (0 missing)
##       X8  < 975     to the left,  improve=0.1485809, (0 missing)
##       X41 < 424.65  to the right, improve=0.1463478, (0 missing)
##   Surrogate splits:
##       Y7  < 17.325  to the right, agree=0.955, adj=0.786, (0 split)
##       X8  < 975     to the left,  agree=0.953, adj=0.780, (0 split)
##       X7  < 945.115 to the left,  agree=0.898, adj=0.519, (0 split)
##       Y6  < 36.57   to the right, agree=0.885, adj=0.459, (0 split)
##       Y46 < 874.895 to the left,  agree=0.861, adj=0.346, (0 split)
## 
## Node number 4: 8919 observations,    complexity param=0.01468595
##   mean=3839138, MSE=6.018791e+09 
##   left son=8 (8129 obs) right son=9 (790 obs)
##   Primary splits:
##       Y36 < 888.9   to the left,  improve=0.11082490, (0 missing)
##       Y37 < 849.52  to the left,  improve=0.09979984, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08753593, (0 missing)
##       X38 < 301.375 to the right, improve=0.08475904, (0 missing)
##       Y31 < 762.785 to the left,  improve=0.08455488, (0 missing)
##   Surrogate splits:
##       Y22 < 597.305 to the left,  agree=0.950, adj=0.432, (0 split)
##       Y41 < 732.145 to the right, agree=0.939, adj=0.306, (0 split)
##       Y35 < 502.59  to the right, agree=0.938, adj=0.300, (0 split)
##       Y14 < 3.185   to the right, agree=0.936, adj=0.280, (0 split)
##       Y28 < 228.085 to the right, agree=0.936, adj=0.272, (0 split)
## 
## Node number 5: 4064 observations,    complexity param=0.01519979
##   mean=3907291, MSE=4.131366e+09 
##   left son=10 (1113 obs) right son=11 (2951 obs)
##   Primary splits:
##       X7  < 989.04  to the left,  improve=0.3667337, (0 missing)
##       Y42 < 722.315 to the right, improve=0.2568592, (0 missing)
##       Y45 < 954.95  to the left,  improve=0.2413413, (0 missing)
##       Y49 < 924.325 to the right, improve=0.2015504, (0 missing)
##       Y37 < 880.775 to the left,  improve=0.2007039, (0 missing)
##   Surrogate splits:
##       X6  < 777.195 to the left,  agree=0.861, adj=0.494, (0 split)
##       X49 < 704.45  to the left,  agree=0.838, adj=0.407, (0 split)
##       X22 < 125     to the right, agree=0.824, adj=0.358, (0 split)
##       X8  < 625     to the right, agree=0.822, adj=0.351, (0 split)
##       Y15 < 175     to the left,  agree=0.822, adj=0.350, (0 split)
## 
## Node number 6: 10871 observations,    complexity param=0.04833354
##   mean=3983323, MSE=1.278124e+10 
##   left son=12 (4919 obs) right son=13 (5952 obs)
##   Primary splits:
##       X11 < 689.855 to the left,  improve=0.1409181, (0 missing)
##       X44 < 232.635 to the right, improve=0.1344166, (0 missing)
##       X42 < 346.165 to the right, improve=0.1323906, (0 missing)
##       X41 < 424.65  to the right, improve=0.1261570, (0 missing)
##       X47 < 489.785 to the right, improve=0.1257598, (0 missing)
##   Surrogate splits:
##       X12 < 632.375 to the left,  agree=0.946, adj=0.880, (0 split)
##       X10 < 746.095 to the left,  agree=0.913, adj=0.807, (0 split)
##       X9  < 775     to the left,  agree=0.845, adj=0.658, (0 split)
##       Y12 < 153.435 to the right, agree=0.789, adj=0.533, (0 split)
##       Y10 < 150.61  to the right, agree=0.786, adj=0.528, (0 split)
## 
## Node number 7: 2925 observations
##   mean=4103024, MSE=9.320662e+08 
## 
## Node number 8: 8129 observations,    complexity param=0.01354492
##   mean=3831086, MSE=5.558648e+09 
##   left son=16 (6793 obs) right son=17 (1336 obs)
##   Primary splits:
##       Y35 < 765.05  to the left,  improve=0.12143140, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08900714, (0 missing)
##       X47 < 25      to the right, improve=0.08094458, (0 missing)
##       X5  < 970     to the left,  improve=0.08011203, (0 missing)
##       Y41 < 884.295 to the left,  improve=0.07540451, (0 missing)
##   Surrogate splits:
##       Y34 < 766.615 to the left,  agree=0.910, adj=0.455, (0 split)
##       Y33 < 775     to the left,  agree=0.877, adj=0.249, (0 split)
##       Y42 < 925     to the left,  agree=0.875, adj=0.238, (0 split)
##       X30 < 8.765   to the right, agree=0.863, adj=0.167, (0 split)
##       Y41 < 927.465 to the left,  agree=0.860, adj=0.147, (0 split)
## 
## Node number 9: 790 observations
##   mean=3921985, MSE=3.222901e+09 
## 
## Node number 10: 1113 observations
##   mean=3843910, MSE=2.99382e+09 
## 
## Node number 11: 2951 observations
##   mean=3931195, MSE=2.473852e+09 
## 
## Node number 12: 4919 observations,    complexity param=0.04293438
##   mean=3936640, MSE=1.519331e+10 
##   left son=24 (2716 obs) right son=25 (2203 obs)
##   Primary splits:
##       X6  < 848.785 to the left,  improve=0.2327215, (0 missing)
##       X5  < 995     to the left,  improve=0.2205077, (0 missing)
##       X8  < 816.01  to the left,  improve=0.2144716, (0 missing)
##       X1  < 575     to the left,  improve=0.1638708, (0 missing)
##       X39 < 325     to the right, improve=0.1629009, (0 missing)
##   Surrogate splits:
##       X5 < 942.285 to the left,  agree=0.981, adj=0.957, (0 split)
##       X8 < 816.01  to the left,  agree=0.979, adj=0.954, (0 split)
##       X7 < 875.59  to the left,  agree=0.953, adj=0.895, (0 split)
##       X1 < 575     to the left,  agree=0.770, adj=0.488, (0 split)
##       X2 < 375     to the left,  agree=0.768, adj=0.481, (0 split)
## 
## Node number 13: 5952 observations,    complexity param=0.02321593
##   mean=4021905, MSE=7.498166e+09 
##   left son=26 (1958 obs) right son=27 (3994 obs)
##   Primary splits:
##       X44 < 232.48  to the right, improve=0.2107311, (0 missing)
##       X43 < 289.965 to the right, improve=0.2058388, (0 missing)
##       X42 < 346.155 to the right, improve=0.1741573, (0 missing)
##       X47 < 689.725 to the right, improve=0.1671233, (0 missing)
##       X46 < 746.075 to the right, improve=0.1531187, (0 missing)
##   Surrogate splits:
##       X43 < 289.945 to the right, agree=0.968, adj=0.904, (0 split)
##       X42 < 346.155 to the right, agree=0.882, adj=0.640, (0 split)
##       X41 < 425     to the right, agree=0.865, adj=0.589, (0 split)
##       Y43 < 821     to the left,  agree=0.769, adj=0.298, (0 split)
##       Y44 < 825     to the left,  agree=0.769, adj=0.297, (0 split)
## 
## Node number 16: 6793 observations
##   mean=3819564, MSE=4.723845e+09 
## 
## Node number 17: 1336 observations
##   mean=3889670, MSE=5.696214e+09 
## 
## Node number 24: 2716 observations,    complexity param=0.02518016
##   mean=3883087, MSE=1.584461e+10 
##   left son=48 (1857 obs) right son=49 (859 obs)
##   Primary splits:
##       Y14 < 37.785  to the right, improve=0.2370324, (0 missing)
##       X15 < 689.855 to the left,  improve=0.2369716, (0 missing)
##       Y16 < 112.665 to the right, improve=0.2362359, (0 missing)
##       Y15 < 75.22   to the right, improve=0.2356853, (0 missing)
##       Y13 < 59.77   to the right, improve=0.2325060, (0 missing)
##   Surrogate splits:
##       Y15 < 75.22   to the right, agree=0.999, adj=0.997, (0 split)
##       Y16 < 112.665 to the right, agree=0.999, adj=0.995, (0 split)
##       Y13 < 31.035  to the right, agree=0.992, adj=0.973, (0 split)
##       Y10 < 37.785  to the right, agree=0.912, adj=0.721, (0 split)
##       Y11 < 75.22   to the right, agree=0.900, adj=0.685, (0 split)
## 
## Node number 25: 2203 observations
##   mean=4002664, MSE=6.495363e+09 
## 
## Node number 26: 1958 observations,    complexity param=0.01148675
##   mean=3965132, MSE=1.048974e+10 
##   left son=52 (1468 obs) right son=53 (490 obs)
##   Primary splits:
##       X48 < 32.6    to the right, improve=0.2265584, (0 missing)
##       X46 < 146.145 to the right, improve=0.2230136, (0 missing)
##       X47 < 89.965  to the right, improve=0.2216917, (0 missing)
##       X45 < 300     to the right, improve=0.2161759, (0 missing)
##       X38 < 546.065 to the right, improve=0.1853625, (0 missing)
##   Surrogate splits:
##       X46 < 146.145 to the right, agree=0.993, adj=0.973, (0 split)
##       X47 < 89.965  to the right, agree=0.993, adj=0.973, (0 split)
##       X45 < 300     to the right, agree=0.982, adj=0.929, (0 split)
##       Y44 < 712.155 to the right, agree=0.919, adj=0.678, (0 split)
##       Y43 < 674.97  to the right, agree=0.907, adj=0.627, (0 split)
## 
## Node number 27: 3994 observations
##   mean=4049737, MSE=3.676873e+09 
## 
## Node number 48: 1857 observations,    complexity param=0.01691646
##   mean=3841406, MSE=1.27667e+10 
##   left son=96 (1579 obs) right son=97 (278 obs)
##   Primary splits:
##       X1  < 900     to the left,  improve=0.2890544, (0 missing)
##       X2  < 922.925 to the left,  improve=0.2781630, (0 missing)
##       X3  < 869.51  to the left,  improve=0.2775724, (0 missing)
##       X4  < 826.82  to the left,  improve=0.2505744, (0 missing)
##       X42 < 346.165 to the right, improve=0.1243527, (0 missing)
##   Surrogate splits:
##       X2  < 922.925 to the left,  agree=0.997, adj=0.982, (0 split)
##       X3  < 869.51  to the left,  agree=0.997, adj=0.978, (0 split)
##       X4  < 826.82  to the left,  agree=0.987, adj=0.910, (0 split)
##       X10 < 746.095 to the left,  agree=0.858, adj=0.050, (0 split)
##       X26 < 145.88  to the right, agree=0.854, adj=0.022, (0 split)
## 
## Node number 49: 859 observations
##   mean=3973193, MSE=1.062369e+10 
## 
## Node number 52: 1468 observations
##   mean=3936967, MSE=1.019573e+10 
## 
## Node number 53: 490 observations
##   mean=4049512, MSE=1.874112e+09 
## 
## Node number 96: 1579 observations
##   mean=3815916, MSE=9.502688e+09 
## 
## Node number 97: 278 observations
##   mean=3986182, MSE=6.655327e+09
y <- predict(tree1,newdata=dataTE)

sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power))
## [1] 723697.8
mean(dataTE$Total_Power)
## [1] 3942322
tree1=rpart(dataTR$Total_Power~.,method="anova",data=dataTR,maxdepth=8,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Total_Power ~ ., data = dataTR, method = "anova", 
##     maxdepth = 8, xval = 10)
##   n= 26779 
## 
##            CP nsplit rel error    xerror        xstd
## 1  0.36278557      0 1.0000000 1.0000676 0.007211339
## 2  0.08152149      1 0.6372144 0.6373315 0.007604795
## 3  0.04833354      2 0.5556929 0.5559021 0.006623813
## 4  0.04293438      3 0.5073594 0.5075875 0.005957534
## 5  0.03201137      4 0.4644250 0.4647465 0.005547842
## 6  0.02518016      5 0.4324136 0.4328277 0.005369440
## 7  0.02321593      6 0.4072335 0.4100853 0.005187899
## 8  0.01691646      7 0.3840176 0.3872132 0.004758365
## 9  0.01519979      8 0.3671011 0.3794939 0.004647898
## 10 0.01468595      9 0.3519013 0.3646313 0.004601151
## 11 0.01354492     10 0.3372154 0.3495865 0.004532012
## 12 0.01148675     11 0.3236704 0.3339097 0.004476211
## 13 0.01000000     12 0.3121837 0.3245621 0.004369423
## 
## Variable importance
## Y46 Y45  Y4  Y3 X49 Y47  X8  Y7  X7  Y8  X6 X11 Y10 X12  X5 X10  X1  Y6  X2 Y12 
##  12  11  10   9   9   9   3   3   3   3   2   2   1   1   1   1   1   1   1   1 
## Y13  X9 Y15 Y14 Y16 X42 X44 X43 Y35 Y11  X3 
##   1   1   1   1   1   1   1   1   1   1   1 
## 
## Node number 1: 26779 observations,    complexity param=0.3627856
##   mean=3936837, MSE=1.512747e+10 
##   left son=2 (12983 obs) right son=3 (13796 obs)
##   Primary splits:
##       Y46 < 875.95  to the right, improve=0.3627856, (0 missing)
##       Y45 < 843.24  to the right, improve=0.3548318, (0 missing)
##       Y4  < 101.5   to the left,  improve=0.3292959, (0 missing)
##       Y3  < 74.275  to the left,  improve=0.3183650, (0 missing)
##       Y47 < 912.745 to the right, improve=0.2426537, (0 missing)
##   Surrogate splits:
##       Y45 < 843.24  to the right, agree=0.966, adj=0.929, (0 split)
##       Y4  < 101.5   to the left,  agree=0.952, adj=0.900, (0 split)
##       Y3  < 74.275  to the left,  agree=0.917, adj=0.828, (0 split)
##       Y47 < 894.03  to the right, agree=0.906, adj=0.805, (0 split)
##       X49 < 42.165  to the right, agree=0.903, adj=0.800, (0 split)
## 
## Node number 2: 12983 observations,    complexity param=0.03201137
##   mean=3860471, MSE=6.426806e+09 
##   left son=4 (8919 obs) right son=5 (4064 obs)
##   Primary splits:
##       Y7  < 1.025   to the right, improve=0.1554156, (0 missing)
##       Y23 < 594.465 to the left,  improve=0.1387768, (0 missing)
##       X7  < 989.66  to the left,  improve=0.1387669, (0 missing)
##       Y37 < 843.14  to the left,  improve=0.1363654, (0 missing)
##       Y43 < 987.355 to the left,  improve=0.1235102, (0 missing)
##   Surrogate splits:
##       Y5  < 0.005   to the right, agree=0.790, adj=0.330, (0 split)
##       Y21 < 252.415 to the right, agree=0.783, adj=0.308, (0 split)
##       X42 < 949.225 to the left,  agree=0.782, adj=0.304, (0 split)
##       Y13 < 108.185 to the right, agree=0.780, adj=0.299, (0 split)
##       Y12 < 113.885 to the right, agree=0.780, adj=0.298, (0 split)
## 
## Node number 3: 13796 observations,    complexity param=0.08152149
##   mean=4008702, MSE=1.266275e+10 
##   left son=6 (10871 obs) right son=7 (2925 obs)
##   Primary splits:
##       Y8  < 0.005   to the right, improve=0.1890390, (0 missing)
##       Y7  < 0.115   to the right, improve=0.1705424, (0 missing)
##       X42 < 346.155 to the right, improve=0.1552648, (0 missing)
##       X8  < 975     to the left,  improve=0.1485809, (0 missing)
##       X41 < 424.65  to the right, improve=0.1463478, (0 missing)
##   Surrogate splits:
##       Y7  < 17.325  to the right, agree=0.955, adj=0.786, (0 split)
##       X8  < 975     to the left,  agree=0.953, adj=0.780, (0 split)
##       X7  < 945.115 to the left,  agree=0.898, adj=0.519, (0 split)
##       Y6  < 36.57   to the right, agree=0.885, adj=0.459, (0 split)
##       Y46 < 874.895 to the left,  agree=0.861, adj=0.346, (0 split)
## 
## Node number 4: 8919 observations,    complexity param=0.01468595
##   mean=3839138, MSE=6.018791e+09 
##   left son=8 (8129 obs) right son=9 (790 obs)
##   Primary splits:
##       Y36 < 888.9   to the left,  improve=0.11082490, (0 missing)
##       Y37 < 849.52  to the left,  improve=0.09979984, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08753593, (0 missing)
##       X38 < 301.375 to the right, improve=0.08475904, (0 missing)
##       Y31 < 762.785 to the left,  improve=0.08455488, (0 missing)
##   Surrogate splits:
##       Y22 < 597.305 to the left,  agree=0.950, adj=0.432, (0 split)
##       Y41 < 732.145 to the right, agree=0.939, adj=0.306, (0 split)
##       Y35 < 502.59  to the right, agree=0.938, adj=0.300, (0 split)
##       Y14 < 3.185   to the right, agree=0.936, adj=0.280, (0 split)
##       Y28 < 228.085 to the right, agree=0.936, adj=0.272, (0 split)
## 
## Node number 5: 4064 observations,    complexity param=0.01519979
##   mean=3907291, MSE=4.131366e+09 
##   left son=10 (1113 obs) right son=11 (2951 obs)
##   Primary splits:
##       X7  < 989.04  to the left,  improve=0.3667337, (0 missing)
##       Y42 < 722.315 to the right, improve=0.2568592, (0 missing)
##       Y45 < 954.95  to the left,  improve=0.2413413, (0 missing)
##       Y49 < 924.325 to the right, improve=0.2015504, (0 missing)
##       Y37 < 880.775 to the left,  improve=0.2007039, (0 missing)
##   Surrogate splits:
##       X6  < 777.195 to the left,  agree=0.861, adj=0.494, (0 split)
##       X49 < 704.45  to the left,  agree=0.838, adj=0.407, (0 split)
##       X22 < 125     to the right, agree=0.824, adj=0.358, (0 split)
##       X8  < 625     to the right, agree=0.822, adj=0.351, (0 split)
##       Y15 < 175     to the left,  agree=0.822, adj=0.350, (0 split)
## 
## Node number 6: 10871 observations,    complexity param=0.04833354
##   mean=3983323, MSE=1.278124e+10 
##   left son=12 (4919 obs) right son=13 (5952 obs)
##   Primary splits:
##       X11 < 689.855 to the left,  improve=0.1409181, (0 missing)
##       X44 < 232.635 to the right, improve=0.1344166, (0 missing)
##       X42 < 346.165 to the right, improve=0.1323906, (0 missing)
##       X41 < 424.65  to the right, improve=0.1261570, (0 missing)
##       X47 < 489.785 to the right, improve=0.1257598, (0 missing)
##   Surrogate splits:
##       X12 < 632.375 to the left,  agree=0.946, adj=0.880, (0 split)
##       X10 < 746.095 to the left,  agree=0.913, adj=0.807, (0 split)
##       X9  < 775     to the left,  agree=0.845, adj=0.658, (0 split)
##       Y12 < 153.435 to the right, agree=0.789, adj=0.533, (0 split)
##       Y10 < 150.61  to the right, agree=0.786, adj=0.528, (0 split)
## 
## Node number 7: 2925 observations
##   mean=4103024, MSE=9.320662e+08 
## 
## Node number 8: 8129 observations,    complexity param=0.01354492
##   mean=3831086, MSE=5.558648e+09 
##   left son=16 (6793 obs) right son=17 (1336 obs)
##   Primary splits:
##       Y35 < 765.05  to the left,  improve=0.12143140, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08900714, (0 missing)
##       X47 < 25      to the right, improve=0.08094458, (0 missing)
##       X5  < 970     to the left,  improve=0.08011203, (0 missing)
##       Y41 < 884.295 to the left,  improve=0.07540451, (0 missing)
##   Surrogate splits:
##       Y34 < 766.615 to the left,  agree=0.910, adj=0.455, (0 split)
##       Y33 < 775     to the left,  agree=0.877, adj=0.249, (0 split)
##       Y42 < 925     to the left,  agree=0.875, adj=0.238, (0 split)
##       X30 < 8.765   to the right, agree=0.863, adj=0.167, (0 split)
##       Y41 < 927.465 to the left,  agree=0.860, adj=0.147, (0 split)
## 
## Node number 9: 790 observations
##   mean=3921985, MSE=3.222901e+09 
## 
## Node number 10: 1113 observations
##   mean=3843910, MSE=2.99382e+09 
## 
## Node number 11: 2951 observations
##   mean=3931195, MSE=2.473852e+09 
## 
## Node number 12: 4919 observations,    complexity param=0.04293438
##   mean=3936640, MSE=1.519331e+10 
##   left son=24 (2716 obs) right son=25 (2203 obs)
##   Primary splits:
##       X6  < 848.785 to the left,  improve=0.2327215, (0 missing)
##       X5  < 995     to the left,  improve=0.2205077, (0 missing)
##       X8  < 816.01  to the left,  improve=0.2144716, (0 missing)
##       X1  < 575     to the left,  improve=0.1638708, (0 missing)
##       X39 < 325     to the right, improve=0.1629009, (0 missing)
##   Surrogate splits:
##       X5 < 942.285 to the left,  agree=0.981, adj=0.957, (0 split)
##       X8 < 816.01  to the left,  agree=0.979, adj=0.954, (0 split)
##       X7 < 875.59  to the left,  agree=0.953, adj=0.895, (0 split)
##       X1 < 575     to the left,  agree=0.770, adj=0.488, (0 split)
##       X2 < 375     to the left,  agree=0.768, adj=0.481, (0 split)
## 
## Node number 13: 5952 observations,    complexity param=0.02321593
##   mean=4021905, MSE=7.498166e+09 
##   left son=26 (1958 obs) right son=27 (3994 obs)
##   Primary splits:
##       X44 < 232.48  to the right, improve=0.2107311, (0 missing)
##       X43 < 289.965 to the right, improve=0.2058388, (0 missing)
##       X42 < 346.155 to the right, improve=0.1741573, (0 missing)
##       X47 < 689.725 to the right, improve=0.1671233, (0 missing)
##       X46 < 746.075 to the right, improve=0.1531187, (0 missing)
##   Surrogate splits:
##       X43 < 289.945 to the right, agree=0.968, adj=0.904, (0 split)
##       X42 < 346.155 to the right, agree=0.882, adj=0.640, (0 split)
##       X41 < 425     to the right, agree=0.865, adj=0.589, (0 split)
##       Y43 < 821     to the left,  agree=0.769, adj=0.298, (0 split)
##       Y44 < 825     to the left,  agree=0.769, adj=0.297, (0 split)
## 
## Node number 16: 6793 observations
##   mean=3819564, MSE=4.723845e+09 
## 
## Node number 17: 1336 observations
##   mean=3889670, MSE=5.696214e+09 
## 
## Node number 24: 2716 observations,    complexity param=0.02518016
##   mean=3883087, MSE=1.584461e+10 
##   left son=48 (1857 obs) right son=49 (859 obs)
##   Primary splits:
##       Y14 < 37.785  to the right, improve=0.2370324, (0 missing)
##       X15 < 689.855 to the left,  improve=0.2369716, (0 missing)
##       Y16 < 112.665 to the right, improve=0.2362359, (0 missing)
##       Y15 < 75.22   to the right, improve=0.2356853, (0 missing)
##       Y13 < 59.77   to the right, improve=0.2325060, (0 missing)
##   Surrogate splits:
##       Y15 < 75.22   to the right, agree=0.999, adj=0.997, (0 split)
##       Y16 < 112.665 to the right, agree=0.999, adj=0.995, (0 split)
##       Y13 < 31.035  to the right, agree=0.992, adj=0.973, (0 split)
##       Y10 < 37.785  to the right, agree=0.912, adj=0.721, (0 split)
##       Y11 < 75.22   to the right, agree=0.900, adj=0.685, (0 split)
## 
## Node number 25: 2203 observations
##   mean=4002664, MSE=6.495363e+09 
## 
## Node number 26: 1958 observations,    complexity param=0.01148675
##   mean=3965132, MSE=1.048974e+10 
##   left son=52 (1468 obs) right son=53 (490 obs)
##   Primary splits:
##       X48 < 32.6    to the right, improve=0.2265584, (0 missing)
##       X46 < 146.145 to the right, improve=0.2230136, (0 missing)
##       X47 < 89.965  to the right, improve=0.2216917, (0 missing)
##       X45 < 300     to the right, improve=0.2161759, (0 missing)
##       X38 < 546.065 to the right, improve=0.1853625, (0 missing)
##   Surrogate splits:
##       X46 < 146.145 to the right, agree=0.993, adj=0.973, (0 split)
##       X47 < 89.965  to the right, agree=0.993, adj=0.973, (0 split)
##       X45 < 300     to the right, agree=0.982, adj=0.929, (0 split)
##       Y44 < 712.155 to the right, agree=0.919, adj=0.678, (0 split)
##       Y43 < 674.97  to the right, agree=0.907, adj=0.627, (0 split)
## 
## Node number 27: 3994 observations
##   mean=4049737, MSE=3.676873e+09 
## 
## Node number 48: 1857 observations,    complexity param=0.01691646
##   mean=3841406, MSE=1.27667e+10 
##   left son=96 (1579 obs) right son=97 (278 obs)
##   Primary splits:
##       X1  < 900     to the left,  improve=0.2890544, (0 missing)
##       X2  < 922.925 to the left,  improve=0.2781630, (0 missing)
##       X3  < 869.51  to the left,  improve=0.2775724, (0 missing)
##       X4  < 826.82  to the left,  improve=0.2505744, (0 missing)
##       X42 < 346.165 to the right, improve=0.1243527, (0 missing)
##   Surrogate splits:
##       X2  < 922.925 to the left,  agree=0.997, adj=0.982, (0 split)
##       X3  < 869.51  to the left,  agree=0.997, adj=0.978, (0 split)
##       X4  < 826.82  to the left,  agree=0.987, adj=0.910, (0 split)
##       X10 < 746.095 to the left,  agree=0.858, adj=0.050, (0 split)
##       X26 < 145.88  to the right, agree=0.854, adj=0.022, (0 split)
## 
## Node number 49: 859 observations
##   mean=3973193, MSE=1.062369e+10 
## 
## Node number 52: 1468 observations
##   mean=3936967, MSE=1.019573e+10 
## 
## Node number 53: 490 observations
##   mean=4049512, MSE=1.874112e+09 
## 
## Node number 96: 1579 observations
##   mean=3815916, MSE=9.502688e+09 
## 
## Node number 97: 278 observations
##   mean=3986182, MSE=6.655327e+09
y <- predict(tree1,newdata=dataTE)

sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power))
## [1] 723697.8
mean(dataTE$Total_Power)
## [1] 3942322
tree1=rpart(dataTR$Total_Power~.,method="anova",data=dataTR,maxdepth=9,xval=10)
summary(tree1)
## Call:
## rpart(formula = dataTR$Total_Power ~ ., data = dataTR, method = "anova", 
##     maxdepth = 9, xval = 10)
##   n= 26779 
## 
##            CP nsplit rel error    xerror        xstd
## 1  0.36278557      0 1.0000000 1.0000550 0.007210262
## 2  0.08152149      1 0.6372144 0.6373025 0.007603498
## 3  0.04833354      2 0.5556929 0.5558454 0.006622361
## 4  0.04293438      3 0.5073594 0.5075581 0.005956554
## 5  0.03201137      4 0.4644250 0.4647135 0.005547608
## 6  0.02518016      5 0.4324136 0.4327540 0.005368326
## 7  0.02321593      6 0.4072335 0.4100217 0.005187511
## 8  0.01691646      7 0.3840176 0.3875344 0.004759501
## 9  0.01519979      8 0.3671011 0.3807597 0.004669275
## 10 0.01468595      9 0.3519013 0.3593163 0.004581686
## 11 0.01354492     10 0.3372154 0.3475035 0.004530980
## 12 0.01148675     11 0.3236704 0.3314409 0.004454890
## 13 0.01000000     12 0.3121837 0.3217584 0.004349654
## 
## Variable importance
## Y46 Y45  Y4  Y3 X49 Y47  X8  Y7  X7  Y8  X6 X11 Y10 X12  X5 X10  X1  Y6  X2 Y12 
##  12  11  10   9   9   9   3   3   3   3   2   2   1   1   1   1   1   1   1   1 
## Y13  X9 Y15 Y14 Y16 X42 X44 X43 Y35 Y11  X3 
##   1   1   1   1   1   1   1   1   1   1   1 
## 
## Node number 1: 26779 observations,    complexity param=0.3627856
##   mean=3936837, MSE=1.512747e+10 
##   left son=2 (12983 obs) right son=3 (13796 obs)
##   Primary splits:
##       Y46 < 875.95  to the right, improve=0.3627856, (0 missing)
##       Y45 < 843.24  to the right, improve=0.3548318, (0 missing)
##       Y4  < 101.5   to the left,  improve=0.3292959, (0 missing)
##       Y3  < 74.275  to the left,  improve=0.3183650, (0 missing)
##       Y47 < 912.745 to the right, improve=0.2426537, (0 missing)
##   Surrogate splits:
##       Y45 < 843.24  to the right, agree=0.966, adj=0.929, (0 split)
##       Y4  < 101.5   to the left,  agree=0.952, adj=0.900, (0 split)
##       Y3  < 74.275  to the left,  agree=0.917, adj=0.828, (0 split)
##       Y47 < 894.03  to the right, agree=0.906, adj=0.805, (0 split)
##       X49 < 42.165  to the right, agree=0.903, adj=0.800, (0 split)
## 
## Node number 2: 12983 observations,    complexity param=0.03201137
##   mean=3860471, MSE=6.426806e+09 
##   left son=4 (8919 obs) right son=5 (4064 obs)
##   Primary splits:
##       Y7  < 1.025   to the right, improve=0.1554156, (0 missing)
##       Y23 < 594.465 to the left,  improve=0.1387768, (0 missing)
##       X7  < 989.66  to the left,  improve=0.1387669, (0 missing)
##       Y37 < 843.14  to the left,  improve=0.1363654, (0 missing)
##       Y43 < 987.355 to the left,  improve=0.1235102, (0 missing)
##   Surrogate splits:
##       Y5  < 0.005   to the right, agree=0.790, adj=0.330, (0 split)
##       Y21 < 252.415 to the right, agree=0.783, adj=0.308, (0 split)
##       X42 < 949.225 to the left,  agree=0.782, adj=0.304, (0 split)
##       Y13 < 108.185 to the right, agree=0.780, adj=0.299, (0 split)
##       Y12 < 113.885 to the right, agree=0.780, adj=0.298, (0 split)
## 
## Node number 3: 13796 observations,    complexity param=0.08152149
##   mean=4008702, MSE=1.266275e+10 
##   left son=6 (10871 obs) right son=7 (2925 obs)
##   Primary splits:
##       Y8  < 0.005   to the right, improve=0.1890390, (0 missing)
##       Y7  < 0.115   to the right, improve=0.1705424, (0 missing)
##       X42 < 346.155 to the right, improve=0.1552648, (0 missing)
##       X8  < 975     to the left,  improve=0.1485809, (0 missing)
##       X41 < 424.65  to the right, improve=0.1463478, (0 missing)
##   Surrogate splits:
##       Y7  < 17.325  to the right, agree=0.955, adj=0.786, (0 split)
##       X8  < 975     to the left,  agree=0.953, adj=0.780, (0 split)
##       X7  < 945.115 to the left,  agree=0.898, adj=0.519, (0 split)
##       Y6  < 36.57   to the right, agree=0.885, adj=0.459, (0 split)
##       Y46 < 874.895 to the left,  agree=0.861, adj=0.346, (0 split)
## 
## Node number 4: 8919 observations,    complexity param=0.01468595
##   mean=3839138, MSE=6.018791e+09 
##   left son=8 (8129 obs) right son=9 (790 obs)
##   Primary splits:
##       Y36 < 888.9   to the left,  improve=0.11082490, (0 missing)
##       Y37 < 849.52  to the left,  improve=0.09979984, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08753593, (0 missing)
##       X38 < 301.375 to the right, improve=0.08475904, (0 missing)
##       Y31 < 762.785 to the left,  improve=0.08455488, (0 missing)
##   Surrogate splits:
##       Y22 < 597.305 to the left,  agree=0.950, adj=0.432, (0 split)
##       Y41 < 732.145 to the right, agree=0.939, adj=0.306, (0 split)
##       Y35 < 502.59  to the right, agree=0.938, adj=0.300, (0 split)
##       Y14 < 3.185   to the right, agree=0.936, adj=0.280, (0 split)
##       Y28 < 228.085 to the right, agree=0.936, adj=0.272, (0 split)
## 
## Node number 5: 4064 observations,    complexity param=0.01519979
##   mean=3907291, MSE=4.131366e+09 
##   left son=10 (1113 obs) right son=11 (2951 obs)
##   Primary splits:
##       X7  < 989.04  to the left,  improve=0.3667337, (0 missing)
##       Y42 < 722.315 to the right, improve=0.2568592, (0 missing)
##       Y45 < 954.95  to the left,  improve=0.2413413, (0 missing)
##       Y49 < 924.325 to the right, improve=0.2015504, (0 missing)
##       Y37 < 880.775 to the left,  improve=0.2007039, (0 missing)
##   Surrogate splits:
##       X6  < 777.195 to the left,  agree=0.861, adj=0.494, (0 split)
##       X49 < 704.45  to the left,  agree=0.838, adj=0.407, (0 split)
##       X22 < 125     to the right, agree=0.824, adj=0.358, (0 split)
##       X8  < 625     to the right, agree=0.822, adj=0.351, (0 split)
##       Y15 < 175     to the left,  agree=0.822, adj=0.350, (0 split)
## 
## Node number 6: 10871 observations,    complexity param=0.04833354
##   mean=3983323, MSE=1.278124e+10 
##   left son=12 (4919 obs) right son=13 (5952 obs)
##   Primary splits:
##       X11 < 689.855 to the left,  improve=0.1409181, (0 missing)
##       X44 < 232.635 to the right, improve=0.1344166, (0 missing)
##       X42 < 346.165 to the right, improve=0.1323906, (0 missing)
##       X41 < 424.65  to the right, improve=0.1261570, (0 missing)
##       X47 < 489.785 to the right, improve=0.1257598, (0 missing)
##   Surrogate splits:
##       X12 < 632.375 to the left,  agree=0.946, adj=0.880, (0 split)
##       X10 < 746.095 to the left,  agree=0.913, adj=0.807, (0 split)
##       X9  < 775     to the left,  agree=0.845, adj=0.658, (0 split)
##       Y12 < 153.435 to the right, agree=0.789, adj=0.533, (0 split)
##       Y10 < 150.61  to the right, agree=0.786, adj=0.528, (0 split)
## 
## Node number 7: 2925 observations
##   mean=4103024, MSE=9.320662e+08 
## 
## Node number 8: 8129 observations,    complexity param=0.01354492
##   mean=3831086, MSE=5.558648e+09 
##   left son=16 (6793 obs) right son=17 (1336 obs)
##   Primary splits:
##       Y35 < 765.05  to the left,  improve=0.12143140, (0 missing)
##       Y38 < 848.49  to the left,  improve=0.08900714, (0 missing)
##       X47 < 25      to the right, improve=0.08094458, (0 missing)
##       X5  < 970     to the left,  improve=0.08011203, (0 missing)
##       Y41 < 884.295 to the left,  improve=0.07540451, (0 missing)
##   Surrogate splits:
##       Y34 < 766.615 to the left,  agree=0.910, adj=0.455, (0 split)
##       Y33 < 775     to the left,  agree=0.877, adj=0.249, (0 split)
##       Y42 < 925     to the left,  agree=0.875, adj=0.238, (0 split)
##       X30 < 8.765   to the right, agree=0.863, adj=0.167, (0 split)
##       Y41 < 927.465 to the left,  agree=0.860, adj=0.147, (0 split)
## 
## Node number 9: 790 observations
##   mean=3921985, MSE=3.222901e+09 
## 
## Node number 10: 1113 observations
##   mean=3843910, MSE=2.99382e+09 
## 
## Node number 11: 2951 observations
##   mean=3931195, MSE=2.473852e+09 
## 
## Node number 12: 4919 observations,    complexity param=0.04293438
##   mean=3936640, MSE=1.519331e+10 
##   left son=24 (2716 obs) right son=25 (2203 obs)
##   Primary splits:
##       X6  < 848.785 to the left,  improve=0.2327215, (0 missing)
##       X5  < 995     to the left,  improve=0.2205077, (0 missing)
##       X8  < 816.01  to the left,  improve=0.2144716, (0 missing)
##       X1  < 575     to the left,  improve=0.1638708, (0 missing)
##       X39 < 325     to the right, improve=0.1629009, (0 missing)
##   Surrogate splits:
##       X5 < 942.285 to the left,  agree=0.981, adj=0.957, (0 split)
##       X8 < 816.01  to the left,  agree=0.979, adj=0.954, (0 split)
##       X7 < 875.59  to the left,  agree=0.953, adj=0.895, (0 split)
##       X1 < 575     to the left,  agree=0.770, adj=0.488, (0 split)
##       X2 < 375     to the left,  agree=0.768, adj=0.481, (0 split)
## 
## Node number 13: 5952 observations,    complexity param=0.02321593
##   mean=4021905, MSE=7.498166e+09 
##   left son=26 (1958 obs) right son=27 (3994 obs)
##   Primary splits:
##       X44 < 232.48  to the right, improve=0.2107311, (0 missing)
##       X43 < 289.965 to the right, improve=0.2058388, (0 missing)
##       X42 < 346.155 to the right, improve=0.1741573, (0 missing)
##       X47 < 689.725 to the right, improve=0.1671233, (0 missing)
##       X46 < 746.075 to the right, improve=0.1531187, (0 missing)
##   Surrogate splits:
##       X43 < 289.945 to the right, agree=0.968, adj=0.904, (0 split)
##       X42 < 346.155 to the right, agree=0.882, adj=0.640, (0 split)
##       X41 < 425     to the right, agree=0.865, adj=0.589, (0 split)
##       Y43 < 821     to the left,  agree=0.769, adj=0.298, (0 split)
##       Y44 < 825     to the left,  agree=0.769, adj=0.297, (0 split)
## 
## Node number 16: 6793 observations
##   mean=3819564, MSE=4.723845e+09 
## 
## Node number 17: 1336 observations
##   mean=3889670, MSE=5.696214e+09 
## 
## Node number 24: 2716 observations,    complexity param=0.02518016
##   mean=3883087, MSE=1.584461e+10 
##   left son=48 (1857 obs) right son=49 (859 obs)
##   Primary splits:
##       Y14 < 37.785  to the right, improve=0.2370324, (0 missing)
##       X15 < 689.855 to the left,  improve=0.2369716, (0 missing)
##       Y16 < 112.665 to the right, improve=0.2362359, (0 missing)
##       Y15 < 75.22   to the right, improve=0.2356853, (0 missing)
##       Y13 < 59.77   to the right, improve=0.2325060, (0 missing)
##   Surrogate splits:
##       Y15 < 75.22   to the right, agree=0.999, adj=0.997, (0 split)
##       Y16 < 112.665 to the right, agree=0.999, adj=0.995, (0 split)
##       Y13 < 31.035  to the right, agree=0.992, adj=0.973, (0 split)
##       Y10 < 37.785  to the right, agree=0.912, adj=0.721, (0 split)
##       Y11 < 75.22   to the right, agree=0.900, adj=0.685, (0 split)
## 
## Node number 25: 2203 observations
##   mean=4002664, MSE=6.495363e+09 
## 
## Node number 26: 1958 observations,    complexity param=0.01148675
##   mean=3965132, MSE=1.048974e+10 
##   left son=52 (1468 obs) right son=53 (490 obs)
##   Primary splits:
##       X48 < 32.6    to the right, improve=0.2265584, (0 missing)
##       X46 < 146.145 to the right, improve=0.2230136, (0 missing)
##       X47 < 89.965  to the right, improve=0.2216917, (0 missing)
##       X45 < 300     to the right, improve=0.2161759, (0 missing)
##       X38 < 546.065 to the right, improve=0.1853625, (0 missing)
##   Surrogate splits:
##       X46 < 146.145 to the right, agree=0.993, adj=0.973, (0 split)
##       X47 < 89.965  to the right, agree=0.993, adj=0.973, (0 split)
##       X45 < 300     to the right, agree=0.982, adj=0.929, (0 split)
##       Y44 < 712.155 to the right, agree=0.919, adj=0.678, (0 split)
##       Y43 < 674.97  to the right, agree=0.907, adj=0.627, (0 split)
## 
## Node number 27: 3994 observations
##   mean=4049737, MSE=3.676873e+09 
## 
## Node number 48: 1857 observations,    complexity param=0.01691646
##   mean=3841406, MSE=1.27667e+10 
##   left son=96 (1579 obs) right son=97 (278 obs)
##   Primary splits:
##       X1  < 900     to the left,  improve=0.2890544, (0 missing)
##       X2  < 922.925 to the left,  improve=0.2781630, (0 missing)
##       X3  < 869.51  to the left,  improve=0.2775724, (0 missing)
##       X4  < 826.82  to the left,  improve=0.2505744, (0 missing)
##       X42 < 346.165 to the right, improve=0.1243527, (0 missing)
##   Surrogate splits:
##       X2  < 922.925 to the left,  agree=0.997, adj=0.982, (0 split)
##       X3  < 869.51  to the left,  agree=0.997, adj=0.978, (0 split)
##       X4  < 826.82  to the left,  agree=0.987, adj=0.910, (0 split)
##       X10 < 746.095 to the left,  agree=0.858, adj=0.050, (0 split)
##       X26 < 145.88  to the right, agree=0.854, adj=0.022, (0 split)
## 
## Node number 49: 859 observations
##   mean=3973193, MSE=1.062369e+10 
## 
## Node number 52: 1468 observations
##   mean=3936967, MSE=1.019573e+10 
## 
## Node number 53: 490 observations
##   mean=4049512, MSE=1.874112e+09 
## 
## Node number 96: 1579 observations
##   mean=3815916, MSE=9.502688e+09 
## 
## Node number 97: 278 observations
##   mean=3986182, MSE=6.655327e+09
y <- predict(tree1,newdata=dataTE)

sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power))
## [1] 723697.8
mean(dataTE$Total_Power)
## [1] 3942322
#rmse did not change as we have increased the depth, implying axes parallel splits do not perform well as we increase the amount of splits, data cannot be partitioned via axes parallel splits considering features

###RF

rf.power=randomForest(Total_Power~.,data=dataTR,mtry=4,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.power)

varImpPlot(rf.power)

y = predict(rf.power,newdata=dataTE)
sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power)) #18927.85 = square root of mean error, mean of test=3942322
## [1] 16844.83
mean(dataTE$Total_Power)
## [1] 3942322
rf.power=randomForest(Total_Power~.,data=dataTR,mtry=5,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.power)

varImpPlot(rf.power)

y = predict(rf.power,newdata=dataTE)
sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power)) #square root of mean error has increased and training random forest models take a long time, so I stopped at this point
## [1] 20102.42
mean(dataTE$Total_Power)
## [1] 3942322
rf.power=randomForest(Total_Power~.,data=dataTR,mtry=6,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.power)

varImpPlot(rf.power)

y = predict(rf.power,newdata=dataTE)
sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power))
## [1] 21757.07
mean(dataTE$Total_Power)
## [1] 3942322
####gbm

noftrees=100
depth=5
learning_rate=0.2
sampling_fraction=0.5


boosting_model=gbm(Total_Power~.,distribution="gaussian", data=dataTR, n.trees = noftrees,interaction.depth = depth,cv.folds=10,class.stratify.cv=TRUE, 
                   n.minobsinnode = 5, shrinkage =learning_rate,
                   bag.fraction = sampling_fraction)
## Warning in getStratify(class.stratify.cv, d = distribution): You can only use
## class.stratify.cv when distribution is bernoulli or multinomial. Ignored.
boosting_model
## gbm(formula = Total_Power ~ ., distribution = "gaussian", data = dataTR, 
##     n.trees = noftrees, interaction.depth = depth, n.minobsinnode = 5, 
##     shrinkage = learning_rate, bag.fraction = sampling_fraction, 
##     cv.folds = 10, class.stratify.cv = TRUE)
## A gradient boosted model with gaussian loss function.
## 100 iterations were performed.
## The best cross-validation iteration was 100.
## There were 98 predictors of which 96 had non-zero influence.
summary(boosting_model)

##     var      rel.inf
## Y46 Y46 22.335202632
## Y45 Y45  9.304194068
## Y8   Y8  4.740184056
## X6   X6  4.324443244
## Y3   Y3  3.970096483
## X47 X47  3.289969447
## Y7   Y7  2.873487606
## X42 X42  2.838441351
## X11 X11  2.773669148
## X3   X3  2.743892135
## X7   X7  2.571555983
## X38 X38  2.567771144
## X43 X43  2.538228173
## X5   X5  2.256178086
## Y13 Y13  2.070846923
## X19 X19  1.539503492
## X1   X1  1.424546107
## X2   X2  1.414686716
## X36 X36  1.338021402
## X45 X45  1.278390246
## Y20 Y20  1.215424594
## X14 X14  1.135764581
## X44 X44  1.115742408
## X48 X48  1.055953120
## X4   X4  0.975444350
## Y23 Y23  0.973852885
## Y37 Y37  0.913969248
## X13 X13  0.901247158
## Y10 Y10  0.895965656
## X37 X37  0.894151354
## Y12 Y12  0.848077748
## Y34 Y34  0.783339692
## Y11 Y11  0.709230112
## Y14 Y14  0.702034577
## Y43 Y43  0.644488899
## Y31 Y31  0.567753203
## X9   X9  0.521053653
## X12 X12  0.514445758
## X46 X46  0.392110747
## X20 X20  0.383045632
## X33 X33  0.278570110
## X41 X41  0.270337901
## Y6   Y6  0.250445138
## Y29 Y29  0.248624888
## X34 X34  0.222041784
## X40 X40  0.215304382
## X31 X31  0.213191084
## Y40 Y40  0.210763501
## Y49 Y49  0.185846934
## X30 X30  0.183874684
## X35 X35  0.178644488
## Y27 Y27  0.174853335
## Y36 Y36  0.173469541
## X10 X10  0.167572226
## X8   X8  0.165314500
## X32 X32  0.141564506
## X18 X18  0.137132126
## Y39 Y39  0.136251135
## X28 X28  0.132448824
## X27 X27  0.130009250
## X39 X39  0.127056378
## Y17 Y17  0.124725009
## Y2   Y2  0.121623253
## X24 X24  0.099818020
## X21 X21  0.098936829
## Y30 Y30  0.089716643
## X22 X22  0.088708039
## Y32 Y32  0.087754630
## X29 X29  0.079542849
## X23 X23  0.073928441
## X49 X49  0.072386570
## Y38 Y38  0.070987644
## X25 X25  0.069480148
## X15 X15  0.067471340
## X26 X26  0.056321925
## Y25 Y25  0.049554716
## X16 X16  0.047297843
## Y21 Y21  0.045003140
## Y24 Y24  0.038306194
## Y26 Y26  0.036488645
## Y22 Y22  0.031871015
## Y44 Y44  0.031028496
## Y4   Y4  0.027868172
## Y33 Y33  0.026996122
## Y42 Y42  0.022541607
## Y19 Y19  0.022078963
## Y9   Y9  0.018199276
## Y35 Y35  0.018162842
## Y16 Y16  0.017438249
## Y41 Y41  0.016383411
## Y28 Y28  0.014770319
## X17 X17  0.013786386
## Y5   Y5  0.012242341
## Y47 Y47  0.012102195
## Y15 Y15  0.010544346
## Y48 Y48  0.006213846
## Y1   Y1  0.000000000
## Y18 Y18  0.000000000
y = predict.gbm(boosting_model,newdata=dataTE,type="response",single.tree=FALSE)
## Using 100 trees...
y = predict(boosting_model,newdata=dataTE,type="response",single.tree=FALSE)
## Using 100 trees...
sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power))
## [1] 30151.75
mean(dataTE$Total_Power)
## [1] 3942322
noftrees=300
depth=5
learning_rate=0.1
sampling_fraction=0.5


boosting_model=gbm(Total_Power~.,distribution="gaussian", data=dataTR, n.trees = noftrees,interaction.depth = depth,cv.folds=10,class.stratify.cv=TRUE, 
                   n.minobsinnode = 5, shrinkage =learning_rate,
                   bag.fraction = sampling_fraction)
## Warning in getStratify(class.stratify.cv, d = distribution): You can only use
## class.stratify.cv when distribution is bernoulli or multinomial. Ignored.
boosting_model
## gbm(formula = Total_Power ~ ., distribution = "gaussian", data = dataTR, 
##     n.trees = noftrees, interaction.depth = depth, n.minobsinnode = 5, 
##     shrinkage = learning_rate, bag.fraction = sampling_fraction, 
##     cv.folds = 10, class.stratify.cv = TRUE)
## A gradient boosted model with gaussian loss function.
## 300 iterations were performed.
## The best cross-validation iteration was 300.
## There were 98 predictors of which 98 had non-zero influence.
summary(boosting_model)

##     var      rel.inf
## Y46 Y46 20.578634905
## Y45 Y45  9.753307614
## Y8   Y8  5.383690021
## Y3   Y3  4.319253831
## X7   X7  3.184158836
## X47 X47  2.886277035
## X6   X6  2.715674793
## X5   X5  2.626163346
## X3   X3  2.441464466
## X43 X43  2.373131636
## X44 X44  2.321899446
## X11 X11  2.110678341
## Y12 Y12  2.095461379
## Y7   Y7  1.926756690
## X48 X48  1.743620948
## X42 X42  1.647840849
## X2   X2  1.619345499
## X38 X38  1.529192962
## X45 X45  1.487430102
## Y14 Y14  1.355208890
## X1   X1  1.303915405
## X13 X13  1.273238388
## X4   X4  1.176878719
## Y23 Y23  1.158859788
## Y37 Y37  1.044177052
## Y10 Y10  0.945070477
## Y20 Y20  0.923564137
## X19 X19  0.912619847
## Y31 Y31  0.890049568
## X39 X39  0.829686429
## X36 X36  0.813771821
## X40 X40  0.789548867
## X14 X14  0.762783843
## Y36 Y36  0.753214406
## X37 X37  0.679788907
## Y30 Y30  0.627551648
## Y38 Y38  0.595540294
## X9   X9  0.589190675
## X46 X46  0.540341681
## Y44 Y44  0.536991347
## X41 X41  0.533705810
## X15 X15  0.533621031
## Y13 Y13  0.505870744
## Y9   Y9  0.477371099
## Y6   Y6  0.472326973
## X35 X35  0.441311835
## X18 X18  0.434466086
## Y43 Y43  0.337363940
## X12 X12  0.330551291
## X30 X30  0.263553845
## X20 X20  0.257542203
## X34 X34  0.243680337
## X10 X10  0.232955500
## Y34 Y34  0.214946978
## X32 X32  0.189965759
## Y15 Y15  0.187040818
## X21 X21  0.181532321
## X27 X27  0.179954061
## X28 X28  0.176779834
## X22 X22  0.175028228
## Y25 Y25  0.165696739
## X31 X31  0.159550616
## X24 X24  0.144331977
## X49 X49  0.142042038
## Y26 Y26  0.125197007
## X16 X16  0.123455796
## Y32 Y32  0.122564082
## X29 X29  0.121532449
## X8   X8  0.115049350
## Y40 Y40  0.105768690
## X33 X33  0.097533035
## X23 X23  0.079892985
## Y2   Y2  0.070472737
## Y16 Y16  0.069473738
## X17 X17  0.069034838
## Y42 Y42  0.061621392
## Y39 Y39  0.056140430
## Y17 Y17  0.053252838
## Y22 Y22  0.049086641
## X26 X26  0.040893526
## Y29 Y29  0.038412433
## Y28 Y28  0.037985632
## Y21 Y21  0.037240684
## Y35 Y35  0.036010537
## Y18 Y18  0.030019179
## Y33 Y33  0.025390821
## Y24 Y24  0.024695280
## Y47 Y47  0.022680649
## Y49 Y49  0.021913473
## Y48 Y48  0.021315880
## Y27 Y27  0.020850231
## X25 X25  0.020670981
## Y1   Y1  0.019540280
## Y19 Y19  0.015107443
## Y11 Y11  0.013582710
## Y41 Y41  0.012404382
## Y5   Y5  0.008683305
## Y4   Y4  0.005361619
y = predict.gbm(boosting_model,newdata=dataTE,single.tree=FALSE)
## Using 300 trees...
y = predict(boosting_model,newdata=dataTE,type="response",single.tree=FALSE)
## Using 300 trees...
y = predict(rf.power,newdata=dataTE)
sqrt(sum((y-dataTE$Total_Power))^2/length(dataTE$Total_Power)) #mean squared error is 30246.16 and mean of target is 3942322.
## [1] 21757.07
mean(dataTE$Total_Power)
## [1] 3942322

Random Forest algorithm has produced the best predictions with respect to RMSE metric.

##Isolet Data This data set is regarding predicting which letter is used by a speaker. There are 26 classes which are stored on the X1. column. None of the methods were able to produce a significant or “accurate” result.

####Isoletdata
dataTR <- data.table(read.csv("isolet1+2+3+4.data",stringsAsFactors=T))
dataTE <- data.table(read.csv("isolet5.data",stringsAsFactors=T))

m <- c()
z1 <- c() #col. indices for dataTE
z2 <- c() #col. indices for dataTR
k <- unique(colnames(dataTR))
for(i in 1:dim(dataTR)[2]){
  y <- which(colnames(dataTE)==k[i])
  if(sum(y)==0){
    m <- c(m,i)
  }
  else{
    z1 <- c(z1,y)
    z2 <- c(z2,i)
  }
}

dataTE2 <- dataTE[,1:length(z1)]
dataTR2 <- dataTR[,1:length(z2)]
y <- colnames(dataTE)[z1]
for(i in 1:length(z1)){
  dataTE2[,i] <- dataTE[[z1[i]]]
}

dataTR2 <- dataTR[,1:length(z2)]
for(i in 1:length(z2)){
  dataTR2[,i] <- dataTR[[z2[i]]]
}
y <- colnames(dataTE)[z1]
colnames(dataTE2) <- y
colnames(dataTR2) <- y

dataTE <- dataTE2
dataTR <- dataTR2

##knn

knnFit <- train(X1.~ ., data = dataTR, method = "knn", trControl = trainControl(method = "cv"),preProcess = c("center","scale"), tuneGrid = expand.grid(k=c(3,5,7,9,11)))
knnFit #k=7 is the best model with respect to RMSE
## k-Nearest Neighbors 
## 
## 6237 samples
##  104 predictor
## 
## Pre-processing: centered (104), scaled (104) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 5613, 5613, 5613, 5613, 5613, 5614, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    3  4.358890  0.6706209  2.246300
##    5  4.248714  0.6841913  2.355829
##    7  4.255512  0.6826289  2.444146
##    9  4.222565  0.6877146  2.491459
##   11  4.252735  0.6837470  2.562791
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 9.
y <- predict(knnFit,newdata=dataTE)
y <- round(y)
tab <- table(y,dataTE$X1.)
sum(diag(tab))/length(dataTE$X1.) #accuracy is 0.057 knn does not work since dimension of the space is too high
## [1] 0.04878049
#Rpart

tree1=rpart(X1.~.,method="class",data=dataTR,maxdepth=10,xval=10)
summary(tree1)
## Call:
## rpart(formula = X1. ~ ., data = dataTR, method = "class", maxdepth = 10, 
##     xval = 10)
##   n= 6237 
## 
##            CP nsplit rel error    xerror        xstd
## 1  0.03985326      0 1.0000000 1.0155078 0.001997770
## 2  0.03518426      2 0.9202935 0.9414707 0.003856934
## 3  0.03410038      4 0.8499250 0.8866100 0.004669876
## 4  0.02943138      6 0.7817242 0.8044022 0.005512553
## 5  0.02651326      8 0.7228614 0.7205269 0.006075303
## 6  0.02417876     11 0.6433217 0.6319827 0.006430055
## 7  0.02376188     12 0.6191429 0.6276472 0.006441914
## 8  0.02151076     14 0.5716191 0.6121394 0.006480352
## 9  0.01884275     15 0.5501084 0.5679506 0.006556474
## 10 0.01750875     16 0.5312656 0.5436051 0.006577734
## 11 0.01667500     17 0.5137569 0.5227614 0.006584423
## 12 0.01567450     18 0.4970819 0.5090879 0.006583063
## 13 0.01033850     19 0.4814074 0.4950809 0.006576945
## 14 0.01017175     20 0.4710689 0.4837419 0.006568481
## 15 0.01000000     21 0.4608971 0.4834084 0.006568185
## 
## Variable importance
##  X1.0000.17  X1.0000.16  X1.0000.18 X.1.0000.35  X1.0000.19  X1.0000.21 
##           4           4           4           4           4           4 
##  X1.0000.20 X.1.0000.36 X.1.0000.37 X.1.0000.20 X.1.0000.50 X.1.0000.29 
##           3           3           3           3           3           3 
##  X.1.0000.5 X.1.0000.43    X.0.9268 X.1.0000.42 X.1.0000.44   X1.0000.5 
##           3           2           2           2           2           2 
## X.1.0000.19  X.1.0000.4   X1.0000.6 X.1.0000.41 X.1.0000.33 X.1.0000.38 
##           2           2           2           2           2           2 
## X.1.0000.14     X0.0790 X.1.0000.39   X1.0000.7 X.1.0000.32   X1.0000.3 
##           2           2           2           2           1           1 
## X.1.0000.34     X0.1718 X.1.0000.31  X1.0000.11  X1.0000.15  X1.0000.14 
##           1           1           1           1           1           1 
##    X.0.4000 X.1.0000.40    X.0.2000 X.1.0000.30 X.1.0000.28  X1.0000.29 
##           1           1           1           1           1           1 
## X.1.0000.21 X.1.0000.52     X0.3152    X.0.6000   X1.0000.2 
##           1           1           1           1           1 
## 
## Node number 1: 6237 observations,    complexity param=0.03985326
##   predicted class=2   expected loss=0.96152  P(node) =1
##     class counts:   239   240   240   240   240   238   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240
##    probabilities: 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 
##   left son=2 (5328 obs) right son=3 (909 obs)
##   Primary splits:
##       X.1.0000.43 < -0.3167 to the left,  improve=219.5324, (0 missing)
##       X.1.0000.44 < -0.4684 to the left,  improve=216.6450, (0 missing)
##       X.1.0000.42 < -0.2417 to the left,  improve=214.5795, (0 missing)
##       X.1.0000.37 < 0.7483  to the left,  improve=208.9374, (0 missing)
##       X.1.0000.38 < 0.7982  to the left,  improve=204.3112, (0 missing)
##   Surrogate splits:
##       X.1.0000.42 < -0.2451 to the left,  agree=0.993, adj=0.953, (0 split)
##       X.1.0000.44 < -0.4617 to the left,  agree=0.992, adj=0.944, (0 split)
##       X.1.0000.41 < -0.2384 to the left,  agree=0.983, adj=0.882, (0 split)
##       X.1.0000.29 < -0.9691 to the left,  agree=0.971, adj=0.799, (0 split)
##       X.1.0000.14 < -0.99   to the left,  agree=0.968, adj=0.783, (0 split)
## 
## Node number 2: 5328 observations,    complexity param=0.03985326
##   predicted class=4   expected loss=0.954955  P(node) =0.8542569
##     class counts:   239   236   239   240   239    31   240     9   240   240   240   239   240   240   240   239   240   240     2   240   240   239   239    18   240   239
##    probabilities: 0.045 0.044 0.045 0.045 0.045 0.006 0.045 0.002 0.045 0.045 0.045 0.045 0.045 0.045 0.045 0.045 0.045 0.045 0.000 0.045 0.045 0.045 0.045 0.003 0.045 0.045 
##   left son=4 (2713 obs) right son=5 (2615 obs)
##   Primary splits:
##       X1.0000.21 < 0.5396  to the left,  improve=198.8763, (0 missing)
##       X1.0000.20 < 0.602   to the left,  improve=191.2544, (0 missing)
##       X1.0000.19 < 0.5761  to the right, improve=188.5564, (0 missing)
##       X1.0000.18 < 0.5755  to the right, improve=185.1640, (0 missing)
##       X1.0000.17 < 0.6002  to the right, improve=184.1110, (0 missing)
##   Surrogate splits:
##       X1.0000.20 < 0.5351  to the left,  agree=0.980, adj=0.959, (0 split)
##       X1.0000.19 < 0.5059  to the left,  agree=0.968, adj=0.934, (0 split)
##       X1.0000.18 < 0.46    to the left,  agree=0.956, adj=0.909, (0 split)
##       X1.0000.17 < 0.4483  to the left,  agree=0.950, adj=0.898, (0 split)
##       X1.0000.16 < 0.4191  to the left,  agree=0.945, adj=0.888, (0 split)
## 
## Node number 3: 909 observations,    complexity param=0.03518426
##   predicted class=19  expected loss=0.7381738  P(node) =0.1457431
##     class counts:     0     4     1     0     1   207     0   231     0     0     0     1     0     0     0     1     0     0   238     0     0     1     1   222     0     1
##    probabilities: 0.000 0.004 0.001 0.000 0.001 0.228 0.000 0.254 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.001 0.000 0.000 0.262 0.000 0.000 0.001 0.001 0.244 0.000 0.001 
##   left son=6 (677 obs) right son=7 (232 obs)
##   Primary splits:
##       X.1.0000.37 < 0.7483  to the left,  improve=194.6172, (0 missing)
##       X.1.0000.36 < -0.3584 to the right, improve=193.9371, (0 missing)
##       X.1.0000.38 < 0.7982  to the left,  improve=187.1875, (0 missing)
##       X.1.0000.39 < 0.8283  to the left,  improve=174.9514, (0 missing)
##       X.0.9268    < -0.4701 to the right, improve=164.1616, (0 missing)
##   Surrogate splits:
##       X.1.0000.38 < 0.8066  to the left,  agree=0.970, adj=0.884, (0 split)
##       X.1.0000.36 < 0.6983  to the left,  agree=0.956, adj=0.828, (0 split)
##       X.1.0000.39 < 0.8749  to the left,  agree=0.952, adj=0.810, (0 split)
##       X.0.9268    < 0.3274  to the left,  agree=0.887, adj=0.556, (0 split)
##       X.1.0000.40 < 0.9433  to the left,  agree=0.886, adj=0.552, (0 split)
## 
## Node number 4: 2713 observations,    complexity param=0.03410038
##   predicted class=21  expected loss=0.911537  P(node) =0.4349848
##     class counts:     7   231   219   235   233     2   231     1     0    52     7     0     5     7     0   231   223     0     0   233   240   233    77     0    11   235
##    probabilities: 0.003 0.085 0.081 0.087 0.086 0.001 0.085 0.000 0.000 0.019 0.003 0.000 0.002 0.003 0.000 0.085 0.082 0.000 0.000 0.086 0.088 0.086 0.028 0.000 0.004 0.087 
##   left son=8 (245 obs) right son=9 (2468 obs)
##   Primary splits:
##       X.1.0000.33 < 0.8149  to the right, improve=167.1838, (0 missing)
##       X.1.0000.32 < 0.5483  to the right, improve=162.4746, (0 missing)
##       X.1.0000.34 < 0.8982  to the right, improve=154.5868, (0 missing)
##       X.1.0000.31 < 0.2716  to the right, improve=154.5296, (0 missing)
##       X.1.0000.35 < 0.8533  to the right, improve=144.2015, (0 missing)
##   Surrogate splits:
##       X.1.0000.32 < 0.5833  to the right, agree=0.986, adj=0.845, (0 split)
##       X.1.0000.34 < 0.8982  to the right, agree=0.982, adj=0.800, (0 split)
##       X.1.0000.31 < 0.2716  to the right, agree=0.976, adj=0.735, (0 split)
##       X.1.0000.35 < 0.9549  to the right, agree=0.962, adj=0.584, (0 split)
##       X.1.0000.30 < 0.0033  to the right, agree=0.961, adj=0.567, (0 split)
## 
## Node number 5: 2615 observations,    complexity param=0.03518426
##   predicted class=9   expected loss=0.9082218  P(node) =0.4192721
##     class counts:   232     5    20     5     6    29     9     8   240   188   233   239   235   233   240     8    17   240     2     7     0     6   162    18   229     4
##    probabilities: 0.089 0.002 0.008 0.002 0.002 0.011 0.003 0.003 0.092 0.072 0.089 0.091 0.090 0.089 0.092 0.003 0.007 0.092 0.001 0.003 0.000 0.002 0.062 0.007 0.088 0.002 
##   left son=10 (640 obs) right son=11 (1975 obs)
##   Primary splits:
##       X.1.0000.5  < -0.9985 to the right, improve=124.8855, (0 missing)
##       X.1.0000.20 < -0.9986 to the right, improve=124.8855, (0 missing)
##       X.1.0000.35 < -0.9967 to the right, improve=124.0233, (0 missing)
##       X.1.0000.50 < -0.9857 to the right, improve=117.9713, (0 missing)
##       X0.0790     < 0.1257  to the right, improve=116.6904, (0 missing)
##   Surrogate splits:
##       X.1.0000.20 < -0.9986 to the right, agree=1.000, adj=1.000, (0 split)
##       X.1.0000.35 < -0.9967 to the right, agree=0.998, adj=0.991, (0 split)
##       X.1.0000.50 < -0.9857 to the right, agree=0.991, adj=0.963, (0 split)
##       X.1.0000.4  < -0.9982 to the right, agree=0.928, adj=0.708, (0 split)
##       X.1.0000.19 < -0.9981 to the right, agree=0.928, adj=0.708, (0 split)
## 
## Node number 6: 677 observations,    complexity param=0.02943138
##   predicted class=8   expected loss=0.661743  P(node) =0.1085458
##     class counts:     0     4     1     0     1   198     0   229     0     0     0     1     0     0     0     1     0     0    18     0     0     1     1   221     0     1
##    probabilities: 0.000 0.006 0.001 0.000 0.001 0.292 0.000 0.338 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.001 0.000 0.000 0.027 0.000 0.000 0.001 0.001 0.326 0.000 0.001 
##   left son=12 (207 obs) right son=13 (470 obs)
##   Primary splits:
##       X.1.0000.36 < -0.3584 to the right, improve=155.1836, (0 missing)
##       X1.0000.21  < 0.9688  to the left,  improve=143.9341, (0 missing)
##       X.1.0000.44 < 0.8616  to the left,  improve=143.7075, (0 missing)
##       X.1.0000.37 < -0.4834 to the right, improve=135.5013, (0 missing)
##       X.1.0000.28 < -0.8225 to the left,  improve=129.7703, (0 missing)
##   Surrogate splits:
##       X.1.0000.37 < -0.1017 to the right, agree=0.931, adj=0.773, (0 split)
##       X.0.9268    < -0.4784 to the right, agree=0.904, adj=0.686, (0 split)
##       X.1.0000.28 < -0.842  to the left,  agree=0.873, adj=0.585, (0 split)
##       X.1.0000.29 < -0.866  to the left,  agree=0.871, adj=0.580, (0 split)
##       X.1.0000.21 < -0.9419 to the right, agree=0.843, adj=0.488, (0 split)
## 
## Node number 7: 232 observations
##   predicted class=19  expected loss=0.05172414  P(node) =0.03719737
##     class counts:     0     0     0     0     0     9     0     2     0     0     0     0     0     0     0     0     0     0   220     0     0     0     0     1     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.000 0.039 0.000 0.009 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.948 0.000 0.000 0.000 0.000 0.004 0.000 0.000 
## 
## Node number 8: 245 observations
##   predicted class=3   expected loss=0.1755102  P(node) =0.03928171
##     class counts:     0     0   202     0     0     0     0     0     0     0     0     0     0     0     0     0     4     0     0     2     0     0     0     0     0    37
##    probabilities: 0.000 0.000 0.824 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.016 0.000 0.000 0.008 0.000 0.000 0.000 0.000 0.000 0.151 
## 
## Node number 9: 2468 observations,    complexity param=0.03410038
##   predicted class=21  expected loss=0.9027553  P(node) =0.3957031
##     class counts:     7   231    17   235   233     2   231     1     0    52     7     0     5     7     0   231   219     0     0   231   240   233    77     0    11   198
##    probabilities: 0.003 0.094 0.007 0.095 0.094 0.001 0.094 0.000 0.000 0.021 0.003 0.000 0.002 0.003 0.000 0.094 0.089 0.000 0.000 0.094 0.097 0.094 0.031 0.000 0.004 0.080 
##   left son=18 (973 obs) right son=19 (1495 obs)
##   Primary splits:
##       X.1.0000.5  < -0.9983 to the left,  improve=135.5242, (0 missing)
##       X.1.0000.20 < -0.9981 to the left,  improve=135.5242, (0 missing)
##       X.1.0000.35 < -0.9984 to the left,  improve=134.0401, (0 missing)
##       X.1.0000.50 < -0.9857 to the left,  improve=132.9583, (0 missing)
##       X.1.0000.4  < -0.998  to the left,  improve=117.7130, (0 missing)
##   Surrogate splits:
##       X.1.0000.20 < -0.9981 to the left,  agree=1.000, adj=1.000, (0 split)
##       X.1.0000.35 < -0.9984 to the left,  agree=0.998, adj=0.995, (0 split)
##       X.1.0000.50 < -0.9857 to the left,  agree=0.982, adj=0.955, (0 split)
##       X.1.0000.4  < -0.998  to the left,  agree=0.917, adj=0.788, (0 split)
##       X.1.0000.19 < -0.9974 to the left,  agree=0.917, adj=0.788, (0 split)
## 
## Node number 10: 640 observations,    complexity param=0.016675
##   predicted class=11  expected loss=0.6515625  P(node) =0.1026134
##     class counts:    19     0    20     0     0     3     8     1    12   183   223    17    20    18    16     4    17    13     1     6     0     3    31     1    20     4
##    probabilities: 0.030 0.000 0.031 0.000 0.000 0.005 0.013 0.002 0.019 0.286 0.348 0.027 0.031 0.028 0.025 0.006 0.027 0.020 0.002 0.009 0.000 0.005 0.048 0.002 0.031 0.006 
##   left son=20 (142 obs) right son=21 (498 obs)
##   Primary splits:
##       X1.0000.15 < 0.5336  to the left,  improve=60.82991, (0 missing)
##       X1.0000.14 < 0.5043  to the left,  improve=58.22395, (0 missing)
##       X1.0000.16 < 0.5369  to the left,  improve=55.48650, (0 missing)
##       X1.0000.17 < 0.5652  to the left,  improve=53.39219, (0 missing)
##       X0.6000    < 0.1909  to the left,  improve=46.37470, (0 missing)
##   Surrogate splits:
##       X1.0000.14 < 0.5068  to the left,  agree=0.991, adj=0.958, (0 split)
##       X1.0000.16 < 0.5461  to the left,  agree=0.981, adj=0.915, (0 split)
##       X1.0000.17 < 0.5652  to the left,  agree=0.975, adj=0.887, (0 split)
##       X1.0000.18 < 0.6015  to the left,  agree=0.955, adj=0.796, (0 split)
##       X1.0000.19 < 0.607   to the left,  agree=0.919, adj=0.634, (0 split)
## 
## Node number 11: 1975 observations,    complexity param=0.02651326
##   predicted class=9   expected loss=0.884557  P(node) =0.3166586
##     class counts:   213     5     0     5     6    26     1     7   228     5    10   222   215   215   224     4     0   227     1     1     0     3   131    17   209     0
##    probabilities: 0.108 0.003 0.000 0.003 0.003 0.013 0.001 0.004 0.115 0.003 0.005 0.112 0.109 0.109 0.113 0.002 0.000 0.115 0.001 0.001 0.000 0.002 0.066 0.009 0.106 0.000 
##   left son=22 (1823 obs) right son=23 (152 obs)
##   Primary splits:
##       X0.1718    < 0.7495  to the left,  improve=125.46140, (0 missing)
##       X1.0000.12 < 0.6143  to the right, improve=106.13760, (0 missing)
##       X.0.2000   < 0.3958  to the left,  improve=104.80330, (0 missing)
##       X0.0790    < 0.1257  to the right, improve= 99.57897, (0 missing)
##       X0.3152    < -0.3407 to the right, improve= 90.40091, (0 missing)
##   Surrogate splits:
##       X1.0000.15 < 0.4082  to the right, agree=0.955, adj=0.414, (0 split)
##       X1.0000.16 < 0.4582  to the right, agree=0.955, adj=0.414, (0 split)
##       X1.0000.14 < 0.4429  to the right, agree=0.954, adj=0.408, (0 split)
##       X1.0000.17 < 0.4708  to the right, agree=0.954, adj=0.401, (0 split)
##       X1.0000.18 < 0.4869  to the right, agree=0.951, adj=0.362, (0 split)
## 
## Node number 12: 207 observations
##   predicted class=6   expected loss=0.1352657  P(node) =0.03318903
##     class counts:     0     0     0     0     1   179     0     9     0     0     0     0     0     0     0     1     0     0    16     0     0     0     0     1     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.005 0.865 0.000 0.043 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.005 0.000 0.000 0.077 0.000 0.000 0.000 0.000 0.005 0.000 0.000 
## 
## Node number 13: 470 observations,    complexity param=0.02943138
##   predicted class=8   expected loss=0.5319149  P(node) =0.07535674
##     class counts:     0     4     1     0     0    19     0   220     0     0     0     1     0     0     0     0     0     0     2     0     0     1     1   220     0     1
##    probabilities: 0.000 0.009 0.002 0.000 0.000 0.040 0.000 0.468 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.004 0.000 0.000 0.002 0.002 0.468 0.000 0.002 
##   left son=26 (217 obs) right son=27 (253 obs)
##   Primary splits:
##       X1.0000.21  < 0.9998  to the left,  improve=143.7542, (0 missing)
##       X.1.0000.44 < 0.8616  to the left,  improve=129.4166, (0 missing)
##       X0.6154     < 0.32    to the right, improve=127.7065, (0 missing)
##       X1.0000.20  < 0.9718  to the left,  improve=125.5350, (0 missing)
##       X1.0000.19  < 0.9949  to the left,  improve=115.7338, (0 missing)
##   Surrogate splits:
##       X1.0000.20 < 0.9718  to the left,  agree=0.966, adj=0.926, (0 split)
##       X1.0000.19 < 0.9949  to the left,  agree=0.953, adj=0.899, (0 split)
##       X1.0000.18 < 0.9742  to the left,  agree=0.940, adj=0.871, (0 split)
##       X1.0000.17 < 0.9908  to the left,  agree=0.938, adj=0.866, (0 split)
##       X1.0000.16 < 0.9913  to the left,  agree=0.932, adj=0.853, (0 split)
## 
## Node number 18: 973 observations,    complexity param=0.02151076
##   predicted class=5   expected loss=0.7728674  P(node) =0.1560045
##     class counts:     7   205     0   199   221     0     5     1     0     1     3     0     3     7     0    40     1     0     0    14   184    57    12     0    10     3
##    probabilities: 0.007 0.211 0.000 0.205 0.227 0.000 0.005 0.001 0.000 0.001 0.003 0.000 0.003 0.007 0.000 0.041 0.001 0.000 0.000 0.014 0.189 0.059 0.012 0.000 0.010 0.003 
##   left son=36 (808 obs) right son=37 (165 obs)
##   Primary splits:
##       X1.0000.6  < -0.0303 to the right, improve=99.32718, (0 missing)
##       X1.0000.7  < -0.1172 to the right, improve=94.48481, (0 missing)
##       X1.0000.5  < 0.1598  to the right, improve=85.96639, (0 missing)
##       X1.0000.11 < 0.3039  to the right, improve=58.06874, (0 missing)
##       X1.0000.12 < 0.9857  to the right, improve=49.49880, (0 missing)
##   Surrogate splits:
##       X1.0000.7  < -0.2946 to the right, agree=0.911, adj=0.473, (0 split)
##       X1.0000.5  < -0.1693 to the right, agree=0.888, adj=0.339, (0 split)
##       X1.0000.11 < -0.0367 to the right, agree=0.875, adj=0.261, (0 split)
##       X1.0000.4  < -0.0095 to the right, agree=0.868, adj=0.224, (0 split)
##       X1.0000.3  < -0.1125 to the right, agree=0.854, adj=0.139, (0 split)
## 
## Node number 19: 1495 observations,    complexity param=0.02651326
##   predicted class=7   expected loss=0.8488294  P(node) =0.2396986
##     class counts:     0    26    17    36    12     2   226     0     0    51     4     0     2     0     0   191   218     0     0   217    56   176    65     0     1   195
##    probabilities: 0.000 0.017 0.011 0.024 0.008 0.001 0.151 0.000 0.000 0.034 0.003 0.000 0.001 0.000 0.000 0.128 0.146 0.000 0.000 0.145 0.037 0.118 0.043 0.000 0.001 0.130 
##   left son=38 (1191 obs) right son=39 (304 obs)
##   Primary splits:
##       X1.0000.5  < 0.1433  to the right, improve=90.10207, (0 missing)
##       X1.0000.29 < 0.3596  to the right, improve=88.11785, (0 missing)
##       X1.0000.6  < -0.0423 to the right, improve=79.08815, (0 missing)
##       X.0.4000   < 0.2143  to the right, improve=78.10916, (0 missing)
##       X1.0000.7  < -0.0703 to the right, improve=74.22847, (0 missing)
##   Surrogate splits:
##       X1.0000.11 < 0.1516  to the right, agree=0.906, adj=0.536, (0 split)
##       X1.0000.3  < 0.2347  to the right, agree=0.903, adj=0.523, (0 split)
##       X1.0000.6  < -0.2143 to the right, agree=0.880, adj=0.408, (0 split)
##       X1.0000.7  < -0.2014 to the right, agree=0.870, adj=0.359, (0 split)
##       X1.0000.2  < 0.1382  to the right, agree=0.836, adj=0.194, (0 split)
## 
## Node number 20: 142 observations
##   predicted class=10  expected loss=0.2042254  P(node) =0.02276736
##     class counts:     0     0     3     0     0     0     1     0     0   113    13     0     0     2     0     1     4     0     0     3     0     0     0     0     1     1
##    probabilities: 0.000 0.000 0.021 0.000 0.000 0.000 0.007 0.000 0.000 0.796 0.092 0.000 0.000 0.014 0.000 0.007 0.028 0.000 0.000 0.021 0.000 0.000 0.000 0.000 0.007 0.007 
## 
## Node number 21: 498 observations
##   predicted class=11  expected loss=0.5783133  P(node) =0.07984608
##     class counts:    19     0    17     0     0     3     7     1    12    70   210    17    20    16    16     3    13    13     1     3     0     3    31     1    19     3
##    probabilities: 0.038 0.000 0.034 0.000 0.000 0.006 0.014 0.002 0.024 0.141 0.422 0.034 0.040 0.032 0.032 0.006 0.026 0.026 0.002 0.006 0.000 0.006 0.062 0.002 0.038 0.006 
## 
## Node number 22: 1823 observations,    complexity param=0.02651326
##   predicted class=9   expected loss=0.8749314  P(node) =0.292288
##     class counts:   213     5     0     4     6    26     0     7   228     4     9   222   213   214   224     3     0   227     1     1     0     2   129    17    68     0
##    probabilities: 0.117 0.003 0.000 0.002 0.003 0.014 0.000 0.004 0.125 0.002 0.005 0.122 0.117 0.117 0.123 0.002 0.000 0.125 0.001 0.001 0.000 0.001 0.071 0.009 0.037 0.000 
##   left son=44 (1299 obs) right son=45 (524 obs)
##   Primary splits:
##       X.0.2000    < 0.4505  to the left,  improve=102.18040, (0 missing)
##       X0.0790     < 0.2487  to the right, improve=100.18010, (0 missing)
##       X.1.0000.52 < 0.4429  to the left,  improve= 95.14058, (0 missing)
##       X0.3152     < -0.3407 to the right, improve= 86.04142, (0 missing)
##       X1.0000.6   < 0.0759  to the right, improve= 69.16199, (0 missing)
##   Surrogate splits:
##       X.0.6000   < 0.496   to the left,  agree=0.830, adj=0.410, (0 split)
##       X1.0000.26 < -0.895  to the right, agree=0.728, adj=0.055, (0 split)
##       X1.0000.28 < -0.5833 to the right, agree=0.716, adj=0.011, (0 split)
##       X1.0000.2  < -0.6876 to the right, agree=0.715, adj=0.008, (0 split)
##       X0.1718    < -0.4862 to the right, agree=0.714, adj=0.006, (0 split)
## 
## Node number 23: 152 observations
##   predicted class=25  expected loss=0.07236842  P(node) =0.02437069
##     class counts:     0     0     0     1     0     0     1     0     0     1     1     0     2     1     0     1     0     0     0     0     0     1     2     0   141     0
##    probabilities: 0.000 0.000 0.000 0.007 0.000 0.000 0.007 0.000 0.000 0.007 0.007 0.000 0.013 0.007 0.000 0.007 0.000 0.000 0.000 0.000 0.000 0.007 0.013 0.000 0.928 0.000 
## 
## Node number 26: 217 observations
##   predicted class=8   expected loss=0.1013825  P(node) =0.03479237
##     class counts:     0     4     1     0     0     3     0   195     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0    12     0     1
##    probabilities: 0.000 0.018 0.005 0.000 0.000 0.014 0.000 0.899 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.005 0.000 0.055 0.000 0.005 
## 
## Node number 27: 253 observations
##   predicted class=24  expected loss=0.1778656  P(node) =0.04056437
##     class counts:     0     0     0     0     0    16     0    25     0     0     0     1     0     0     0     0     0     0     2     0     0     0     1   208     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.000 0.063 0.000 0.099 0.000 0.000 0.000 0.004 0.000 0.000 0.000 0.000 0.000 0.000 0.008 0.000 0.000 0.000 0.004 0.822 0.000 0.000 
## 
## Node number 36: 808 observations,    complexity param=0.01884275
##   predicted class=5   expected loss=0.7339109  P(node) =0.1295495
##     class counts:     7   198     0   196   215     0     5     1     0     1     3     0     0     5     0    40     1     0     0    14    49    55     7     0     8     3
##    probabilities: 0.009 0.245 0.000 0.243 0.266 0.000 0.006 0.001 0.000 0.001 0.004 0.000 0.000 0.006 0.000 0.050 0.001 0.000 0.000 0.017 0.061 0.068 0.009 0.000 0.010 0.004 
##   left son=72 (348 obs) right son=73 (460 obs)
##   Primary splits:
##       X.0.4000   < -0.9857 to the left,  improve=47.96012, (0 missing)
##       X1.0000.24 < 0.8707  to the right, improve=38.10847, (0 missing)
##       X1.0000.23 < 0.8493  to the right, improve=37.60073, (0 missing)
##       X1.0000.22 < 0.9322  to the right, improve=37.58566, (0 missing)
##       X1.0000.25 < 0.8652  to the right, improve=32.93544, (0 missing)
##   Surrogate splits:
##       X0.5150    < 0.7864  to the right, agree=0.594, adj=0.057, (0 split)
##       X0.5578    < 0.7824  to the right, agree=0.592, adj=0.052, (0 split)
##       X0.6000.1  < 0.2747  to the left,  agree=0.590, adj=0.049, (0 split)
##       X1.0000.22 < 0.9787  to the right, agree=0.590, adj=0.049, (0 split)
##       X1.0000.23 < 0.9733  to the right, agree=0.589, adj=0.046, (0 split)
## 
## Node number 37: 165 observations
##   predicted class=21  expected loss=0.1818182  P(node) =0.02645503
##     class counts:     0     7     0     3     6     0     0     0     0     0     0     0     3     2     0     0     0     0     0     0   135     2     5     0     2     0
##    probabilities: 0.000 0.042 0.000 0.018 0.036 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.018 0.012 0.000 0.000 0.000 0.000 0.000 0.000 0.818 0.012 0.030 0.000 0.012 0.000 
## 
## Node number 38: 1191 observations,    complexity param=0.02417876
##   predicted class=7   expected loss=0.8161209  P(node) =0.1909572
##     class counts:     0    24    16    34    12     1   219     0     0    36     4     0     0     0     0   184    52     0     0   209    16   170    29     0     0   185
##    probabilities: 0.000 0.020 0.013 0.029 0.010 0.001 0.184 0.000 0.000 0.030 0.003 0.000 0.000 0.000 0.000 0.154 0.044 0.000 0.000 0.175 0.013 0.143 0.024 0.000 0.000 0.155 
##   left son=76 (956 obs) right son=77 (235 obs)
##   Primary splits:
##       X1.0000.29  < 0.3596  to the right, improve=82.49215, (0 missing)
##       X.0.4000    < -0.3571 to the right, improve=81.04785, (0 missing)
##       X1.0000.30  < 0       to the left,  improve=80.81691, (0 missing)
##       X.1.0000.16 < -0.9841 to the left,  improve=80.60869, (0 missing)
##       X.1.0000.1  < -0.9911 to the left,  improve=80.57083, (0 missing)
##   Surrogate splits:
##       X0.5272     < 0.2668  to the right, agree=0.868, adj=0.332, (0 split)
##       X.1.0000.2  < -0.7237 to the left,  agree=0.829, adj=0.132, (0 split)
##       X1.0000.28  < 0.7151  to the left,  agree=0.828, adj=0.128, (0 split)
##       X.1.0000.18 < -0.6414 to the left,  agree=0.827, adj=0.123, (0 split)
##       X.1.0000.1  < -0.77   to the left,  agree=0.826, adj=0.119, (0 split)
## 
## Node number 39: 304 observations
##   predicted class=17  expected loss=0.4539474  P(node) =0.04874138
##     class counts:     0     2     1     2     0     1     7     0     0    15     0     0     2     0     0     7   166     0     0     8    40     6    36     0     1    10
##    probabilities: 0.000 0.007 0.003 0.007 0.000 0.003 0.023 0.000 0.000 0.049 0.000 0.000 0.007 0.000 0.000 0.023 0.546 0.000 0.000 0.026 0.132 0.020 0.118 0.000 0.003 0.033 
## 
## Node number 44: 1299 observations,    complexity param=0.02376188
##   predicted class=14  expected loss=0.8360277  P(node) =0.2082732
##     class counts:   210     5     0     4     6    22     0     7    36     4     9   203   208   213    80     3     0   105     1     1     0     2   100    15    65     0
##    probabilities: 0.162 0.004 0.000 0.003 0.005 0.017 0.000 0.005 0.028 0.003 0.007 0.156 0.160 0.164 0.062 0.002 0.000 0.081 0.001 0.001 0.000 0.002 0.077 0.012 0.050 0.000 
##   left son=88 (873 obs) right son=89 (426 obs)
##   Primary splits:
##       X0.3152     < -0.3406 to the right, improve=74.06612, (0 missing)
##       X.1.0000.52 < 0.6143  to the left,  improve=73.48581, (0 missing)
##       X0.0790     < 0.0783  to the right, improve=72.25580, (0 missing)
##       X.0.9714    < -0.7571 to the left,  improve=49.16143, (0 missing)
##       X1.0000.6   < 0.1691  to the right, improve=47.59252, (0 missing)
##   Surrogate splits:
##       X.0.9714    < -0.2143 to the left,  agree=0.785, adj=0.345, (0 split)
##       X.1.0000.52 < 0.9143  to the left,  agree=0.702, adj=0.092, (0 split)
##       X.0.6000    < 0.7272  to the left,  agree=0.680, adj=0.023, (0 split)
##       X.0.2000    < -0.7254 to the right, agree=0.679, adj=0.021, (0 split)
##       X0.0790     < 0.9622  to the left,  agree=0.677, adj=0.016, (0 split)
## 
## Node number 45: 524 observations,    complexity param=0.01750875
##   predicted class=9   expected loss=0.6335878  P(node) =0.08401475
##     class counts:     3     0     0     0     0     4     0     0   192     0     0    19     5     1   144     0     0   122     0     0     0     0    29     2     3     0
##    probabilities: 0.006 0.000 0.000 0.000 0.000 0.008 0.000 0.000 0.366 0.000 0.000 0.036 0.010 0.002 0.275 0.000 0.000 0.233 0.000 0.000 0.000 0.000 0.055 0.004 0.006 0.000 
##   left son=90 (254 obs) right son=91 (270 obs)
##   Primary splits:
##       X0.0790   < 0.2656  to the right, improve=90.34397, (0 missing)
##       X1.0000.6 < 0.0704  to the right, improve=73.59495, (0 missing)
##       X1.0000.7 < 0.0746  to the right, improve=72.13282, (0 missing)
##       X1.0000.5 < 0.2708  to the right, improve=52.15184, (0 missing)
##       X.0.9268  < -0.7195 to the right, improve=49.50023, (0 missing)
##   Surrogate splits:
##       X1.0000.6  < 0.0368  to the right, agree=0.828, adj=0.646, (0 split)
##       X1.0000.7  < -0.1719 to the right, agree=0.805, adj=0.598, (0 split)
##       X1.0000.5  < 0.0936  to the right, agree=0.788, adj=0.563, (0 split)
##       X1.0000.3  < 0.2066  to the right, agree=0.754, adj=0.492, (0 split)
##       X1.0000.11 < 0.2546  to the right, agree=0.742, adj=0.469, (0 split)
## 
## Node number 72: 348 observations
##   predicted class=2   expected loss=0.5402299  P(node) =0.05579606
##     class counts:     3   160     0    28   111     0     1     1     0     0     0     0     0     1     0     3     0     0     0     1    15    13     3     0     7     1
##    probabilities: 0.009 0.460 0.000 0.080 0.319 0.000 0.003 0.003 0.000 0.000 0.000 0.000 0.000 0.003 0.000 0.009 0.000 0.000 0.000 0.003 0.043 0.037 0.009 0.000 0.020 0.003 
## 
## Node number 73: 460 observations
##   predicted class=4   expected loss=0.6347826  P(node) =0.07375341
##     class counts:     4    38     0   168   104     0     4     0     0     1     3     0     0     4     0    37     1     0     0    13    34    42     4     0     1     2
##    probabilities: 0.009 0.083 0.000 0.365 0.226 0.000 0.009 0.000 0.000 0.002 0.007 0.000 0.000 0.009 0.000 0.080 0.002 0.000 0.000 0.028 0.074 0.091 0.009 0.000 0.002 0.004 
## 
## Node number 76: 956 observations,    complexity param=0.0156745
##   predicted class=7   expected loss=0.7730126  P(node) =0.1532788
##     class counts:     0    15    10    25    10     1   217     0     0    36     4     0     0     0     0   176    50     0     0   189    16   151    18     0     0    38
##    probabilities: 0.000 0.016 0.010 0.026 0.010 0.001 0.227 0.000 0.000 0.038 0.004 0.000 0.000 0.000 0.000 0.184 0.052 0.000 0.000 0.198 0.017 0.158 0.019 0.000 0.000 0.040 
##   left son=152 (507 obs) right son=153 (449 obs)
##   Primary splits:
##       X.0.4000    < 0.7     to the right, improve=46.90970, (0 missing)
##       X1.0000.27  < 0.3495  to the left,  improve=44.22357, (0 missing)
##       X.1.0000.50 < 0.9571  to the right, improve=40.72380, (0 missing)
##       X1.0000.28  < 0.3703  to the left,  improve=35.34098, (0 missing)
##       X1.0000.30  < 0       to the left,  improve=32.07147, (0 missing)
##   Surrogate splits:
##       X.1.0000.20 < -0.9148 to the left,  agree=0.804, adj=0.584, (0 split)
##       X.1.0000.48 < -0.9857 to the left,  agree=0.771, adj=0.512, (0 split)
##       X.1.0000.18 < -0.9903 to the left,  agree=0.766, adj=0.501, (0 split)
##       X.1.0000.3  < -0.9924 to the left,  agree=0.764, adj=0.497, (0 split)
##       X.1.0000.33 < -0.9975 to the left,  agree=0.764, adj=0.497, (0 split)
## 
## Node number 77: 235 observations
##   predicted class=26  expected loss=0.3744681  P(node) =0.03767837
##     class counts:     0     9     6     9     2     0     2     0     0     0     0     0     0     0     0     8     2     0     0    20     0    19    11     0     0   147
##    probabilities: 0.000 0.038 0.026 0.038 0.009 0.000 0.009 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.034 0.009 0.000 0.000 0.085 0.000 0.081 0.047 0.000 0.000 0.626 
## 
## Node number 88: 873 observations,    complexity param=0.02376188
##   predicted class=1   expected loss=0.7880871  P(node) =0.1399711
##     class counts:   185     5     0     4     6    22     0     6    36     4     8   184    48    70    72     3     0   103     1     1     0     2    33    15    65     0
##    probabilities: 0.212 0.006 0.000 0.005 0.007 0.025 0.000 0.007 0.041 0.005 0.009 0.211 0.055 0.080 0.082 0.003 0.000 0.118 0.001 0.001 0.000 0.002 0.038 0.017 0.074 0.000 
##   left son=176 (543 obs) right son=177 (330 obs)
##   Primary splits:
##       X0.0790   < 0.0854  to the right, improve=76.36535, (0 missing)
##       X1.0000.7 < 0.0201  to the right, improve=50.45657, (0 missing)
##       X0.5958   < 0.6467  to the right, improve=49.41710, (0 missing)
##       X1.0000.5 < 0.4168  to the right, improve=46.33057, (0 missing)
##       X1.0000.6 < 0.0779  to the right, improve=45.58592, (0 missing)
##   Surrogate splits:
##       X0.5958   < 0.348   to the right, agree=0.772, adj=0.397, (0 split)
##       X1.0000.3 < 0.2368  to the right, agree=0.764, adj=0.376, (0 split)
##       X1.0000.5 < 0.0569  to the right, agree=0.749, adj=0.336, (0 split)
##       X1.0000.2 < 0.3379  to the right, agree=0.741, adj=0.315, (0 split)
##       X1.0000.7 < -0.232  to the right, agree=0.727, adj=0.279, (0 split)
## 
## Node number 89: 426 observations,    complexity param=0.0103385
##   predicted class=13  expected loss=0.6244131  P(node) =0.06830207
##     class counts:    25     0     0     0     0     0     0     1     0     0     1    19   160   143     8     0     0     2     0     0     0     0    67     0     0     0
##    probabilities: 0.059 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.000 0.000 0.002 0.045 0.376 0.336 0.019 0.000 0.000 0.005 0.000 0.000 0.000 0.000 0.157 0.000 0.000 0.000 
##   left son=178 (361 obs) right son=179 (65 obs)
##   Primary splits:
##       X.1.0000.52 < 0.6857  to the left,  improve=68.83559, (0 missing)
##       X.0.6000    < 0.3764  to the left,  improve=40.76568, (0 missing)
##       X1.0000.1   < 0.248   to the right, improve=34.40573, (0 missing)
##       X.1.0000.51 < -0.2429 to the left,  improve=33.83566, (0 missing)
##       X1.0000.24  < -0.4582 to the right, improve=18.04496, (0 missing)
##   Surrogate splits:
##       X.1.0000.51 < 0.7     to the left,  agree=0.915, adj=0.446, (0 split)
##       X1.0000.1   < 0.0779  to the right, agree=0.908, adj=0.400, (0 split)
##       X.0.6000    < 0.4425  to the left,  agree=0.908, adj=0.400, (0 split)
##       X1.0000.2   < -0.2367 to the right, agree=0.869, adj=0.138, (0 split)
##       X0.8222     < -0.4346 to the right, agree=0.859, adj=0.077, (0 split)
## 
## Node number 90: 254 observations
##   predicted class=9   expected loss=0.3031496  P(node) =0.04072471
##     class counts:     3     0     0     0     0     3     0     0   177     0     0     3     4     1    24     0     0     9     0     0     0     0    25     2     3     0
##    probabilities: 0.012 0.000 0.000 0.000 0.000 0.012 0.000 0.000 0.697 0.000 0.000 0.012 0.016 0.004 0.094 0.000 0.000 0.035 0.000 0.000 0.000 0.000 0.098 0.008 0.012 0.000 
## 
## Node number 91: 270 observations
##   predicted class=15  expected loss=0.5555556  P(node) =0.04329004
##     class counts:     0     0     0     0     0     1     0     0    15     0     0    16     1     0   120     0     0   113     0     0     0     0     4     0     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.000 0.004 0.000 0.000 0.056 0.000 0.000 0.059 0.004 0.000 0.444 0.000 0.000 0.419 0.000 0.000 0.000 0.000 0.015 0.000 0.000 0.000 
## 
## Node number 152: 507 observations
##   predicted class=7   expected loss=0.6587771  P(node) =0.08128908
##     class counts:     0     1     0     2     4     1   173     0     0    31     3     0     0     0     0   138     9     0     0   113     2    13    17     0     0     0
##    probabilities: 0.000 0.002 0.000 0.004 0.008 0.002 0.341 0.000 0.000 0.061 0.006 0.000 0.000 0.000 0.000 0.272 0.018 0.000 0.000 0.223 0.004 0.026 0.034 0.000 0.000 0.000 
## 
## Node number 153: 449 observations,    complexity param=0.01017175
##   predicted class=22  expected loss=0.6926503  P(node) =0.07198974
##     class counts:     0    14    10    23     6     0    44     0     0     5     1     0     0     0     0    38    41     0     0    76    14   138     1     0     0    38
##    probabilities: 0.000 0.031 0.022 0.051 0.013 0.000 0.098 0.000 0.000 0.011 0.002 0.000 0.000 0.000 0.000 0.085 0.091 0.000 0.000 0.169 0.031 0.307 0.002 0.000 0.000 0.085 
##   left son=306 (189 obs) right son=307 (260 obs)
##   Primary splits:
##       X.1.0000.50 < 0.9285  to the right, improve=44.79498, (0 missing)
##       X.1.0000.49 < 0.9714  to the right, improve=40.48966, (0 missing)
##       X.1.0000.20 < -0.5899 to the right, improve=35.73415, (0 missing)
##       X.1.0000.35 < -0.1467 to the right, improve=31.73381, (0 missing)
##       X1.0000.6   < -0.0508 to the right, improve=19.73595, (0 missing)
##   Surrogate splits:
##       X.1.0000.49 < 0.7     to the right, agree=0.875, adj=0.704, (0 split)
##       X.0.4000    < -0.5571 to the right, agree=0.713, adj=0.317, (0 split)
##       X.1.0000.20 < -0.6108 to the right, agree=0.708, adj=0.307, (0 split)
##       X.1.0000.16 < -0.984  to the left,  agree=0.699, adj=0.286, (0 split)
##       X.1.0000.35 < -0.6617 to the right, agree=0.695, adj=0.275, (0 split)
## 
## Node number 176: 543 observations
##   predicted class=1   expected loss=0.6629834  P(node) =0.08706109
##     class counts:   183     5     0     2     4    20     0     6    32     4     8    29    36    54    33     3     0    12     1     1     0     2    31    15    62     0
##    probabilities: 0.337 0.009 0.000 0.004 0.007 0.037 0.000 0.011 0.059 0.007 0.015 0.053 0.066 0.099 0.061 0.006 0.000 0.022 0.002 0.002 0.000 0.004 0.057 0.028 0.114 0.000 
## 
## Node number 177: 330 observations
##   predicted class=12  expected loss=0.530303  P(node) =0.05291005
##     class counts:     2     0     0     2     2     2     0     0     4     0     0   155    12    16    39     0     0    91     0     0     0     0     2     0     3     0
##    probabilities: 0.006 0.000 0.000 0.006 0.006 0.006 0.000 0.000 0.012 0.000 0.000 0.470 0.036 0.048 0.118 0.000 0.000 0.276 0.000 0.000 0.000 0.000 0.006 0.000 0.009 0.000 
## 
## Node number 178: 361 observations
##   predicted class=13  expected loss=0.5595568  P(node) =0.05788039
##     class counts:    25     0     0     0     0     0     0     1     0     0     1    19   159   142     8     0     0     2     0     0     0     0     4     0     0     0
##    probabilities: 0.069 0.000 0.000 0.000 0.000 0.000 0.000 0.003 0.000 0.000 0.003 0.053 0.440 0.393 0.022 0.000 0.000 0.006 0.000 0.000 0.000 0.000 0.011 0.000 0.000 0.000 
## 
## Node number 179: 65 observations
##   predicted class=23  expected loss=0.03076923  P(node) =0.01042168
##     class counts:     0     0     0     0     0     0     0     0     0     0     0     0     1     1     0     0     0     0     0     0     0     0    63     0     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.015 0.015 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.969 0.000 0.000 0.000 
## 
## Node number 306: 189 observations
##   predicted class=20  expected loss=0.6296296  P(node) =0.03030303
##     class counts:     0     0     0     1     0     0    41     0     0     4     1     0     0     0     0    28    28     0     0    70     1     9     1     0     0     5
##    probabilities: 0.000 0.000 0.000 0.005 0.000 0.000 0.217 0.000 0.000 0.021 0.005 0.000 0.000 0.000 0.000 0.148 0.148 0.000 0.000 0.370 0.005 0.048 0.005 0.000 0.000 0.026 
## 
## Node number 307: 260 observations
##   predicted class=22  expected loss=0.5038462  P(node) =0.04168671
##     class counts:     0    14    10    22     6     0     3     0     0     1     0     0     0     0     0    10    13     0     0     6    13   129     0     0     0    33
##    probabilities: 0.000 0.054 0.038 0.085 0.023 0.000 0.012 0.000 0.000 0.004 0.000 0.000 0.000 0.000 0.000 0.038 0.050 0.000 0.000 0.023 0.050 0.496 0.000 0.000 0.000 0.127
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)
y <- predict(tree1,method="class",newdata=dataTE)
preds <- c()
for(i in 1:length(y[,1])){
  t <- which.max(y[i,])
  preds <- c(preds,t)
}
tab <- table(preds,dataTE$X1.)
sum(diag(tab))/length(dataTE$X1.) #accuracy is 0.0417
## [1] 0.04172015
tree1=rpart(X1.~.,method="class",data=dataTR,maxdepth=12,xval=10)
summary(tree1)
## Call:
## rpart(formula = X1. ~ ., data = dataTR, method = "class", maxdepth = 12, 
##     xval = 10)
##   n= 6237 
## 
##            CP nsplit rel error    xerror        xstd
## 1  0.03985326      0 1.0000000 1.0130065 0.002094637
## 2  0.03518426      2 0.9202935 0.9363015 0.003945919
## 3  0.03410038      4 0.8499250 0.8759380 0.004800421
## 4  0.02943138      6 0.7817242 0.7950642 0.005588009
## 5  0.02651326      8 0.7228614 0.7221944 0.006066436
## 6  0.02417876     11 0.6433217 0.6444889 0.006393080
## 7  0.02376188     12 0.6191429 0.6276472 0.006441914
## 8  0.02151076     14 0.5716191 0.5999666 0.006506214
## 9  0.01884275     15 0.5501084 0.5716191 0.006552007
## 10 0.01750875     16 0.5312656 0.5447724 0.006577046
## 11 0.01667500     17 0.5137569 0.5329331 0.006582482
## 12 0.01567450     18 0.4970819 0.5132566 0.006583960
## 13 0.01033850     19 0.4814074 0.4955811 0.006577246
## 14 0.01017175     20 0.4710689 0.4857429 0.006570204
## 15 0.01000000     21 0.4608971 0.4840754 0.006568775
## 
## Variable importance
##  X1.0000.17  X1.0000.16  X1.0000.18 X.1.0000.35  X1.0000.19  X1.0000.21 
##           4           4           4           4           4           4 
##  X1.0000.20 X.1.0000.36 X.1.0000.37 X.1.0000.20 X.1.0000.50 X.1.0000.29 
##           3           3           3           3           3           3 
##  X.1.0000.5 X.1.0000.43    X.0.9268 X.1.0000.42 X.1.0000.44   X1.0000.5 
##           3           2           2           2           2           2 
## X.1.0000.19  X.1.0000.4   X1.0000.6 X.1.0000.41 X.1.0000.33 X.1.0000.38 
##           2           2           2           2           2           2 
## X.1.0000.14     X0.0790 X.1.0000.39   X1.0000.7 X.1.0000.32   X1.0000.3 
##           2           2           2           2           1           1 
## X.1.0000.34     X0.1718 X.1.0000.31  X1.0000.11  X1.0000.15  X1.0000.14 
##           1           1           1           1           1           1 
##    X.0.4000 X.1.0000.40    X.0.2000 X.1.0000.30 X.1.0000.28  X1.0000.29 
##           1           1           1           1           1           1 
## X.1.0000.21 X.1.0000.52     X0.3152    X.0.6000   X1.0000.2 
##           1           1           1           1           1 
## 
## Node number 1: 6237 observations,    complexity param=0.03985326
##   predicted class=2   expected loss=0.96152  P(node) =1
##     class counts:   239   240   240   240   240   238   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240   240
##    probabilities: 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 0.038 
##   left son=2 (5328 obs) right son=3 (909 obs)
##   Primary splits:
##       X.1.0000.43 < -0.3167 to the left,  improve=219.5324, (0 missing)
##       X.1.0000.44 < -0.4684 to the left,  improve=216.6450, (0 missing)
##       X.1.0000.42 < -0.2417 to the left,  improve=214.5795, (0 missing)
##       X.1.0000.37 < 0.7483  to the left,  improve=208.9374, (0 missing)
##       X.1.0000.38 < 0.7982  to the left,  improve=204.3112, (0 missing)
##   Surrogate splits:
##       X.1.0000.42 < -0.2451 to the left,  agree=0.993, adj=0.953, (0 split)
##       X.1.0000.44 < -0.4617 to the left,  agree=0.992, adj=0.944, (0 split)
##       X.1.0000.41 < -0.2384 to the left,  agree=0.983, adj=0.882, (0 split)
##       X.1.0000.29 < -0.9691 to the left,  agree=0.971, adj=0.799, (0 split)
##       X.1.0000.14 < -0.99   to the left,  agree=0.968, adj=0.783, (0 split)
## 
## Node number 2: 5328 observations,    complexity param=0.03985326
##   predicted class=4   expected loss=0.954955  P(node) =0.8542569
##     class counts:   239   236   239   240   239    31   240     9   240   240   240   239   240   240   240   239   240   240     2   240   240   239   239    18   240   239
##    probabilities: 0.045 0.044 0.045 0.045 0.045 0.006 0.045 0.002 0.045 0.045 0.045 0.045 0.045 0.045 0.045 0.045 0.045 0.045 0.000 0.045 0.045 0.045 0.045 0.003 0.045 0.045 
##   left son=4 (2713 obs) right son=5 (2615 obs)
##   Primary splits:
##       X1.0000.21 < 0.5396  to the left,  improve=198.8763, (0 missing)
##       X1.0000.20 < 0.602   to the left,  improve=191.2544, (0 missing)
##       X1.0000.19 < 0.5761  to the right, improve=188.5564, (0 missing)
##       X1.0000.18 < 0.5755  to the right, improve=185.1640, (0 missing)
##       X1.0000.17 < 0.6002  to the right, improve=184.1110, (0 missing)
##   Surrogate splits:
##       X1.0000.20 < 0.5351  to the left,  agree=0.980, adj=0.959, (0 split)
##       X1.0000.19 < 0.5059  to the left,  agree=0.968, adj=0.934, (0 split)
##       X1.0000.18 < 0.46    to the left,  agree=0.956, adj=0.909, (0 split)
##       X1.0000.17 < 0.4483  to the left,  agree=0.950, adj=0.898, (0 split)
##       X1.0000.16 < 0.4191  to the left,  agree=0.945, adj=0.888, (0 split)
## 
## Node number 3: 909 observations,    complexity param=0.03518426
##   predicted class=19  expected loss=0.7381738  P(node) =0.1457431
##     class counts:     0     4     1     0     1   207     0   231     0     0     0     1     0     0     0     1     0     0   238     0     0     1     1   222     0     1
##    probabilities: 0.000 0.004 0.001 0.000 0.001 0.228 0.000 0.254 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.001 0.000 0.000 0.262 0.000 0.000 0.001 0.001 0.244 0.000 0.001 
##   left son=6 (677 obs) right son=7 (232 obs)
##   Primary splits:
##       X.1.0000.37 < 0.7483  to the left,  improve=194.6172, (0 missing)
##       X.1.0000.36 < -0.3584 to the right, improve=193.9371, (0 missing)
##       X.1.0000.38 < 0.7982  to the left,  improve=187.1875, (0 missing)
##       X.1.0000.39 < 0.8283  to the left,  improve=174.9514, (0 missing)
##       X.0.9268    < -0.4701 to the right, improve=164.1616, (0 missing)
##   Surrogate splits:
##       X.1.0000.38 < 0.8066  to the left,  agree=0.970, adj=0.884, (0 split)
##       X.1.0000.36 < 0.6983  to the left,  agree=0.956, adj=0.828, (0 split)
##       X.1.0000.39 < 0.8749  to the left,  agree=0.952, adj=0.810, (0 split)
##       X.0.9268    < 0.3274  to the left,  agree=0.887, adj=0.556, (0 split)
##       X.1.0000.40 < 0.9433  to the left,  agree=0.886, adj=0.552, (0 split)
## 
## Node number 4: 2713 observations,    complexity param=0.03410038
##   predicted class=21  expected loss=0.911537  P(node) =0.4349848
##     class counts:     7   231   219   235   233     2   231     1     0    52     7     0     5     7     0   231   223     0     0   233   240   233    77     0    11   235
##    probabilities: 0.003 0.085 0.081 0.087 0.086 0.001 0.085 0.000 0.000 0.019 0.003 0.000 0.002 0.003 0.000 0.085 0.082 0.000 0.000 0.086 0.088 0.086 0.028 0.000 0.004 0.087 
##   left son=8 (245 obs) right son=9 (2468 obs)
##   Primary splits:
##       X.1.0000.33 < 0.8149  to the right, improve=167.1838, (0 missing)
##       X.1.0000.32 < 0.5483  to the right, improve=162.4746, (0 missing)
##       X.1.0000.34 < 0.8982  to the right, improve=154.5868, (0 missing)
##       X.1.0000.31 < 0.2716  to the right, improve=154.5296, (0 missing)
##       X.1.0000.35 < 0.8533  to the right, improve=144.2015, (0 missing)
##   Surrogate splits:
##       X.1.0000.32 < 0.5833  to the right, agree=0.986, adj=0.845, (0 split)
##       X.1.0000.34 < 0.8982  to the right, agree=0.982, adj=0.800, (0 split)
##       X.1.0000.31 < 0.2716  to the right, agree=0.976, adj=0.735, (0 split)
##       X.1.0000.35 < 0.9549  to the right, agree=0.962, adj=0.584, (0 split)
##       X.1.0000.30 < 0.0033  to the right, agree=0.961, adj=0.567, (0 split)
## 
## Node number 5: 2615 observations,    complexity param=0.03518426
##   predicted class=9   expected loss=0.9082218  P(node) =0.4192721
##     class counts:   232     5    20     5     6    29     9     8   240   188   233   239   235   233   240     8    17   240     2     7     0     6   162    18   229     4
##    probabilities: 0.089 0.002 0.008 0.002 0.002 0.011 0.003 0.003 0.092 0.072 0.089 0.091 0.090 0.089 0.092 0.003 0.007 0.092 0.001 0.003 0.000 0.002 0.062 0.007 0.088 0.002 
##   left son=10 (640 obs) right son=11 (1975 obs)
##   Primary splits:
##       X.1.0000.5  < -0.9985 to the right, improve=124.8855, (0 missing)
##       X.1.0000.20 < -0.9986 to the right, improve=124.8855, (0 missing)
##       X.1.0000.35 < -0.9967 to the right, improve=124.0233, (0 missing)
##       X.1.0000.50 < -0.9857 to the right, improve=117.9713, (0 missing)
##       X0.0790     < 0.1257  to the right, improve=116.6904, (0 missing)
##   Surrogate splits:
##       X.1.0000.20 < -0.9986 to the right, agree=1.000, adj=1.000, (0 split)
##       X.1.0000.35 < -0.9967 to the right, agree=0.998, adj=0.991, (0 split)
##       X.1.0000.50 < -0.9857 to the right, agree=0.991, adj=0.963, (0 split)
##       X.1.0000.4  < -0.9982 to the right, agree=0.928, adj=0.708, (0 split)
##       X.1.0000.19 < -0.9981 to the right, agree=0.928, adj=0.708, (0 split)
## 
## Node number 6: 677 observations,    complexity param=0.02943138
##   predicted class=8   expected loss=0.661743  P(node) =0.1085458
##     class counts:     0     4     1     0     1   198     0   229     0     0     0     1     0     0     0     1     0     0    18     0     0     1     1   221     0     1
##    probabilities: 0.000 0.006 0.001 0.000 0.001 0.292 0.000 0.338 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.001 0.000 0.000 0.027 0.000 0.000 0.001 0.001 0.326 0.000 0.001 
##   left son=12 (207 obs) right son=13 (470 obs)
##   Primary splits:
##       X.1.0000.36 < -0.3584 to the right, improve=155.1836, (0 missing)
##       X1.0000.21  < 0.9688  to the left,  improve=143.9341, (0 missing)
##       X.1.0000.44 < 0.8616  to the left,  improve=143.7075, (0 missing)
##       X.1.0000.37 < -0.4834 to the right, improve=135.5013, (0 missing)
##       X.1.0000.28 < -0.8225 to the left,  improve=129.7703, (0 missing)
##   Surrogate splits:
##       X.1.0000.37 < -0.1017 to the right, agree=0.931, adj=0.773, (0 split)
##       X.0.9268    < -0.4784 to the right, agree=0.904, adj=0.686, (0 split)
##       X.1.0000.28 < -0.842  to the left,  agree=0.873, adj=0.585, (0 split)
##       X.1.0000.29 < -0.866  to the left,  agree=0.871, adj=0.580, (0 split)
##       X.1.0000.21 < -0.9419 to the right, agree=0.843, adj=0.488, (0 split)
## 
## Node number 7: 232 observations
##   predicted class=19  expected loss=0.05172414  P(node) =0.03719737
##     class counts:     0     0     0     0     0     9     0     2     0     0     0     0     0     0     0     0     0     0   220     0     0     0     0     1     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.000 0.039 0.000 0.009 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.948 0.000 0.000 0.000 0.000 0.004 0.000 0.000 
## 
## Node number 8: 245 observations
##   predicted class=3   expected loss=0.1755102  P(node) =0.03928171
##     class counts:     0     0   202     0     0     0     0     0     0     0     0     0     0     0     0     0     4     0     0     2     0     0     0     0     0    37
##    probabilities: 0.000 0.000 0.824 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.016 0.000 0.000 0.008 0.000 0.000 0.000 0.000 0.000 0.151 
## 
## Node number 9: 2468 observations,    complexity param=0.03410038
##   predicted class=21  expected loss=0.9027553  P(node) =0.3957031
##     class counts:     7   231    17   235   233     2   231     1     0    52     7     0     5     7     0   231   219     0     0   231   240   233    77     0    11   198
##    probabilities: 0.003 0.094 0.007 0.095 0.094 0.001 0.094 0.000 0.000 0.021 0.003 0.000 0.002 0.003 0.000 0.094 0.089 0.000 0.000 0.094 0.097 0.094 0.031 0.000 0.004 0.080 
##   left son=18 (973 obs) right son=19 (1495 obs)
##   Primary splits:
##       X.1.0000.5  < -0.9983 to the left,  improve=135.5242, (0 missing)
##       X.1.0000.20 < -0.9981 to the left,  improve=135.5242, (0 missing)
##       X.1.0000.35 < -0.9984 to the left,  improve=134.0401, (0 missing)
##       X.1.0000.50 < -0.9857 to the left,  improve=132.9583, (0 missing)
##       X.1.0000.4  < -0.998  to the left,  improve=117.7130, (0 missing)
##   Surrogate splits:
##       X.1.0000.20 < -0.9981 to the left,  agree=1.000, adj=1.000, (0 split)
##       X.1.0000.35 < -0.9984 to the left,  agree=0.998, adj=0.995, (0 split)
##       X.1.0000.50 < -0.9857 to the left,  agree=0.982, adj=0.955, (0 split)
##       X.1.0000.4  < -0.998  to the left,  agree=0.917, adj=0.788, (0 split)
##       X.1.0000.19 < -0.9974 to the left,  agree=0.917, adj=0.788, (0 split)
## 
## Node number 10: 640 observations,    complexity param=0.016675
##   predicted class=11  expected loss=0.6515625  P(node) =0.1026134
##     class counts:    19     0    20     0     0     3     8     1    12   183   223    17    20    18    16     4    17    13     1     6     0     3    31     1    20     4
##    probabilities: 0.030 0.000 0.031 0.000 0.000 0.005 0.013 0.002 0.019 0.286 0.348 0.027 0.031 0.028 0.025 0.006 0.027 0.020 0.002 0.009 0.000 0.005 0.048 0.002 0.031 0.006 
##   left son=20 (142 obs) right son=21 (498 obs)
##   Primary splits:
##       X1.0000.15 < 0.5336  to the left,  improve=60.82991, (0 missing)
##       X1.0000.14 < 0.5043  to the left,  improve=58.22395, (0 missing)
##       X1.0000.16 < 0.5369  to the left,  improve=55.48650, (0 missing)
##       X1.0000.17 < 0.5652  to the left,  improve=53.39219, (0 missing)
##       X0.6000    < 0.1909  to the left,  improve=46.37470, (0 missing)
##   Surrogate splits:
##       X1.0000.14 < 0.5068  to the left,  agree=0.991, adj=0.958, (0 split)
##       X1.0000.16 < 0.5461  to the left,  agree=0.981, adj=0.915, (0 split)
##       X1.0000.17 < 0.5652  to the left,  agree=0.975, adj=0.887, (0 split)
##       X1.0000.18 < 0.6015  to the left,  agree=0.955, adj=0.796, (0 split)
##       X1.0000.19 < 0.607   to the left,  agree=0.919, adj=0.634, (0 split)
## 
## Node number 11: 1975 observations,    complexity param=0.02651326
##   predicted class=9   expected loss=0.884557  P(node) =0.3166586
##     class counts:   213     5     0     5     6    26     1     7   228     5    10   222   215   215   224     4     0   227     1     1     0     3   131    17   209     0
##    probabilities: 0.108 0.003 0.000 0.003 0.003 0.013 0.001 0.004 0.115 0.003 0.005 0.112 0.109 0.109 0.113 0.002 0.000 0.115 0.001 0.001 0.000 0.002 0.066 0.009 0.106 0.000 
##   left son=22 (1823 obs) right son=23 (152 obs)
##   Primary splits:
##       X0.1718    < 0.7495  to the left,  improve=125.46140, (0 missing)
##       X1.0000.12 < 0.6143  to the right, improve=106.13760, (0 missing)
##       X.0.2000   < 0.3958  to the left,  improve=104.80330, (0 missing)
##       X0.0790    < 0.1257  to the right, improve= 99.57897, (0 missing)
##       X0.3152    < -0.3407 to the right, improve= 90.40091, (0 missing)
##   Surrogate splits:
##       X1.0000.15 < 0.4082  to the right, agree=0.955, adj=0.414, (0 split)
##       X1.0000.16 < 0.4582  to the right, agree=0.955, adj=0.414, (0 split)
##       X1.0000.14 < 0.4429  to the right, agree=0.954, adj=0.408, (0 split)
##       X1.0000.17 < 0.4708  to the right, agree=0.954, adj=0.401, (0 split)
##       X1.0000.18 < 0.4869  to the right, agree=0.951, adj=0.362, (0 split)
## 
## Node number 12: 207 observations
##   predicted class=6   expected loss=0.1352657  P(node) =0.03318903
##     class counts:     0     0     0     0     1   179     0     9     0     0     0     0     0     0     0     1     0     0    16     0     0     0     0     1     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.005 0.865 0.000 0.043 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.005 0.000 0.000 0.077 0.000 0.000 0.000 0.000 0.005 0.000 0.000 
## 
## Node number 13: 470 observations,    complexity param=0.02943138
##   predicted class=8   expected loss=0.5319149  P(node) =0.07535674
##     class counts:     0     4     1     0     0    19     0   220     0     0     0     1     0     0     0     0     0     0     2     0     0     1     1   220     0     1
##    probabilities: 0.000 0.009 0.002 0.000 0.000 0.040 0.000 0.468 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.004 0.000 0.000 0.002 0.002 0.468 0.000 0.002 
##   left son=26 (217 obs) right son=27 (253 obs)
##   Primary splits:
##       X1.0000.21  < 0.9998  to the left,  improve=143.7542, (0 missing)
##       X.1.0000.44 < 0.8616  to the left,  improve=129.4166, (0 missing)
##       X0.6154     < 0.32    to the right, improve=127.7065, (0 missing)
##       X1.0000.20  < 0.9718  to the left,  improve=125.5350, (0 missing)
##       X1.0000.19  < 0.9949  to the left,  improve=115.7338, (0 missing)
##   Surrogate splits:
##       X1.0000.20 < 0.9718  to the left,  agree=0.966, adj=0.926, (0 split)
##       X1.0000.19 < 0.9949  to the left,  agree=0.953, adj=0.899, (0 split)
##       X1.0000.18 < 0.9742  to the left,  agree=0.940, adj=0.871, (0 split)
##       X1.0000.17 < 0.9908  to the left,  agree=0.938, adj=0.866, (0 split)
##       X1.0000.16 < 0.9913  to the left,  agree=0.932, adj=0.853, (0 split)
## 
## Node number 18: 973 observations,    complexity param=0.02151076
##   predicted class=5   expected loss=0.7728674  P(node) =0.1560045
##     class counts:     7   205     0   199   221     0     5     1     0     1     3     0     3     7     0    40     1     0     0    14   184    57    12     0    10     3
##    probabilities: 0.007 0.211 0.000 0.205 0.227 0.000 0.005 0.001 0.000 0.001 0.003 0.000 0.003 0.007 0.000 0.041 0.001 0.000 0.000 0.014 0.189 0.059 0.012 0.000 0.010 0.003 
##   left son=36 (808 obs) right son=37 (165 obs)
##   Primary splits:
##       X1.0000.6  < -0.0303 to the right, improve=99.32718, (0 missing)
##       X1.0000.7  < -0.1172 to the right, improve=94.48481, (0 missing)
##       X1.0000.5  < 0.1598  to the right, improve=85.96639, (0 missing)
##       X1.0000.11 < 0.3039  to the right, improve=58.06874, (0 missing)
##       X1.0000.12 < 0.9857  to the right, improve=49.49880, (0 missing)
##   Surrogate splits:
##       X1.0000.7  < -0.2946 to the right, agree=0.911, adj=0.473, (0 split)
##       X1.0000.5  < -0.1693 to the right, agree=0.888, adj=0.339, (0 split)
##       X1.0000.11 < -0.0367 to the right, agree=0.875, adj=0.261, (0 split)
##       X1.0000.4  < -0.0095 to the right, agree=0.868, adj=0.224, (0 split)
##       X1.0000.3  < -0.1125 to the right, agree=0.854, adj=0.139, (0 split)
## 
## Node number 19: 1495 observations,    complexity param=0.02651326
##   predicted class=7   expected loss=0.8488294  P(node) =0.2396986
##     class counts:     0    26    17    36    12     2   226     0     0    51     4     0     2     0     0   191   218     0     0   217    56   176    65     0     1   195
##    probabilities: 0.000 0.017 0.011 0.024 0.008 0.001 0.151 0.000 0.000 0.034 0.003 0.000 0.001 0.000 0.000 0.128 0.146 0.000 0.000 0.145 0.037 0.118 0.043 0.000 0.001 0.130 
##   left son=38 (1191 obs) right son=39 (304 obs)
##   Primary splits:
##       X1.0000.5  < 0.1433  to the right, improve=90.10207, (0 missing)
##       X1.0000.29 < 0.3596  to the right, improve=88.11785, (0 missing)
##       X1.0000.6  < -0.0423 to the right, improve=79.08815, (0 missing)
##       X.0.4000   < 0.2143  to the right, improve=78.10916, (0 missing)
##       X1.0000.7  < -0.0703 to the right, improve=74.22847, (0 missing)
##   Surrogate splits:
##       X1.0000.11 < 0.1516  to the right, agree=0.906, adj=0.536, (0 split)
##       X1.0000.3  < 0.2347  to the right, agree=0.903, adj=0.523, (0 split)
##       X1.0000.6  < -0.2143 to the right, agree=0.880, adj=0.408, (0 split)
##       X1.0000.7  < -0.2014 to the right, agree=0.870, adj=0.359, (0 split)
##       X1.0000.2  < 0.1382  to the right, agree=0.836, adj=0.194, (0 split)
## 
## Node number 20: 142 observations
##   predicted class=10  expected loss=0.2042254  P(node) =0.02276736
##     class counts:     0     0     3     0     0     0     1     0     0   113    13     0     0     2     0     1     4     0     0     3     0     0     0     0     1     1
##    probabilities: 0.000 0.000 0.021 0.000 0.000 0.000 0.007 0.000 0.000 0.796 0.092 0.000 0.000 0.014 0.000 0.007 0.028 0.000 0.000 0.021 0.000 0.000 0.000 0.000 0.007 0.007 
## 
## Node number 21: 498 observations
##   predicted class=11  expected loss=0.5783133  P(node) =0.07984608
##     class counts:    19     0    17     0     0     3     7     1    12    70   210    17    20    16    16     3    13    13     1     3     0     3    31     1    19     3
##    probabilities: 0.038 0.000 0.034 0.000 0.000 0.006 0.014 0.002 0.024 0.141 0.422 0.034 0.040 0.032 0.032 0.006 0.026 0.026 0.002 0.006 0.000 0.006 0.062 0.002 0.038 0.006 
## 
## Node number 22: 1823 observations,    complexity param=0.02651326
##   predicted class=9   expected loss=0.8749314  P(node) =0.292288
##     class counts:   213     5     0     4     6    26     0     7   228     4     9   222   213   214   224     3     0   227     1     1     0     2   129    17    68     0
##    probabilities: 0.117 0.003 0.000 0.002 0.003 0.014 0.000 0.004 0.125 0.002 0.005 0.122 0.117 0.117 0.123 0.002 0.000 0.125 0.001 0.001 0.000 0.001 0.071 0.009 0.037 0.000 
##   left son=44 (1299 obs) right son=45 (524 obs)
##   Primary splits:
##       X.0.2000    < 0.4505  to the left,  improve=102.18040, (0 missing)
##       X0.0790     < 0.2487  to the right, improve=100.18010, (0 missing)
##       X.1.0000.52 < 0.4429  to the left,  improve= 95.14058, (0 missing)
##       X0.3152     < -0.3407 to the right, improve= 86.04142, (0 missing)
##       X1.0000.6   < 0.0759  to the right, improve= 69.16199, (0 missing)
##   Surrogate splits:
##       X.0.6000   < 0.496   to the left,  agree=0.830, adj=0.410, (0 split)
##       X1.0000.26 < -0.895  to the right, agree=0.728, adj=0.055, (0 split)
##       X1.0000.28 < -0.5833 to the right, agree=0.716, adj=0.011, (0 split)
##       X1.0000.2  < -0.6876 to the right, agree=0.715, adj=0.008, (0 split)
##       X0.1718    < -0.4862 to the right, agree=0.714, adj=0.006, (0 split)
## 
## Node number 23: 152 observations
##   predicted class=25  expected loss=0.07236842  P(node) =0.02437069
##     class counts:     0     0     0     1     0     0     1     0     0     1     1     0     2     1     0     1     0     0     0     0     0     1     2     0   141     0
##    probabilities: 0.000 0.000 0.000 0.007 0.000 0.000 0.007 0.000 0.000 0.007 0.007 0.000 0.013 0.007 0.000 0.007 0.000 0.000 0.000 0.000 0.000 0.007 0.013 0.000 0.928 0.000 
## 
## Node number 26: 217 observations
##   predicted class=8   expected loss=0.1013825  P(node) =0.03479237
##     class counts:     0     4     1     0     0     3     0   195     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0    12     0     1
##    probabilities: 0.000 0.018 0.005 0.000 0.000 0.014 0.000 0.899 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.005 0.000 0.055 0.000 0.005 
## 
## Node number 27: 253 observations
##   predicted class=24  expected loss=0.1778656  P(node) =0.04056437
##     class counts:     0     0     0     0     0    16     0    25     0     0     0     1     0     0     0     0     0     0     2     0     0     0     1   208     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.000 0.063 0.000 0.099 0.000 0.000 0.000 0.004 0.000 0.000 0.000 0.000 0.000 0.000 0.008 0.000 0.000 0.000 0.004 0.822 0.000 0.000 
## 
## Node number 36: 808 observations,    complexity param=0.01884275
##   predicted class=5   expected loss=0.7339109  P(node) =0.1295495
##     class counts:     7   198     0   196   215     0     5     1     0     1     3     0     0     5     0    40     1     0     0    14    49    55     7     0     8     3
##    probabilities: 0.009 0.245 0.000 0.243 0.266 0.000 0.006 0.001 0.000 0.001 0.004 0.000 0.000 0.006 0.000 0.050 0.001 0.000 0.000 0.017 0.061 0.068 0.009 0.000 0.010 0.004 
##   left son=72 (348 obs) right son=73 (460 obs)
##   Primary splits:
##       X.0.4000   < -0.9857 to the left,  improve=47.96012, (0 missing)
##       X1.0000.24 < 0.8707  to the right, improve=38.10847, (0 missing)
##       X1.0000.23 < 0.8493  to the right, improve=37.60073, (0 missing)
##       X1.0000.22 < 0.9322  to the right, improve=37.58566, (0 missing)
##       X1.0000.25 < 0.8652  to the right, improve=32.93544, (0 missing)
##   Surrogate splits:
##       X0.5150    < 0.7864  to the right, agree=0.594, adj=0.057, (0 split)
##       X0.5578    < 0.7824  to the right, agree=0.592, adj=0.052, (0 split)
##       X0.6000.1  < 0.2747  to the left,  agree=0.590, adj=0.049, (0 split)
##       X1.0000.22 < 0.9787  to the right, agree=0.590, adj=0.049, (0 split)
##       X1.0000.23 < 0.9733  to the right, agree=0.589, adj=0.046, (0 split)
## 
## Node number 37: 165 observations
##   predicted class=21  expected loss=0.1818182  P(node) =0.02645503
##     class counts:     0     7     0     3     6     0     0     0     0     0     0     0     3     2     0     0     0     0     0     0   135     2     5     0     2     0
##    probabilities: 0.000 0.042 0.000 0.018 0.036 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.018 0.012 0.000 0.000 0.000 0.000 0.000 0.000 0.818 0.012 0.030 0.000 0.012 0.000 
## 
## Node number 38: 1191 observations,    complexity param=0.02417876
##   predicted class=7   expected loss=0.8161209  P(node) =0.1909572
##     class counts:     0    24    16    34    12     1   219     0     0    36     4     0     0     0     0   184    52     0     0   209    16   170    29     0     0   185
##    probabilities: 0.000 0.020 0.013 0.029 0.010 0.001 0.184 0.000 0.000 0.030 0.003 0.000 0.000 0.000 0.000 0.154 0.044 0.000 0.000 0.175 0.013 0.143 0.024 0.000 0.000 0.155 
##   left son=76 (956 obs) right son=77 (235 obs)
##   Primary splits:
##       X1.0000.29  < 0.3596  to the right, improve=82.49215, (0 missing)
##       X.0.4000    < -0.3571 to the right, improve=81.04785, (0 missing)
##       X1.0000.30  < 0       to the left,  improve=80.81691, (0 missing)
##       X.1.0000.16 < -0.9841 to the left,  improve=80.60869, (0 missing)
##       X.1.0000.1  < -0.9911 to the left,  improve=80.57083, (0 missing)
##   Surrogate splits:
##       X0.5272     < 0.2668  to the right, agree=0.868, adj=0.332, (0 split)
##       X.1.0000.2  < -0.7237 to the left,  agree=0.829, adj=0.132, (0 split)
##       X1.0000.28  < 0.7151  to the left,  agree=0.828, adj=0.128, (0 split)
##       X.1.0000.18 < -0.6414 to the left,  agree=0.827, adj=0.123, (0 split)
##       X.1.0000.1  < -0.77   to the left,  agree=0.826, adj=0.119, (0 split)
## 
## Node number 39: 304 observations
##   predicted class=17  expected loss=0.4539474  P(node) =0.04874138
##     class counts:     0     2     1     2     0     1     7     0     0    15     0     0     2     0     0     7   166     0     0     8    40     6    36     0     1    10
##    probabilities: 0.000 0.007 0.003 0.007 0.000 0.003 0.023 0.000 0.000 0.049 0.000 0.000 0.007 0.000 0.000 0.023 0.546 0.000 0.000 0.026 0.132 0.020 0.118 0.000 0.003 0.033 
## 
## Node number 44: 1299 observations,    complexity param=0.02376188
##   predicted class=14  expected loss=0.8360277  P(node) =0.2082732
##     class counts:   210     5     0     4     6    22     0     7    36     4     9   203   208   213    80     3     0   105     1     1     0     2   100    15    65     0
##    probabilities: 0.162 0.004 0.000 0.003 0.005 0.017 0.000 0.005 0.028 0.003 0.007 0.156 0.160 0.164 0.062 0.002 0.000 0.081 0.001 0.001 0.000 0.002 0.077 0.012 0.050 0.000 
##   left son=88 (873 obs) right son=89 (426 obs)
##   Primary splits:
##       X0.3152     < -0.3406 to the right, improve=74.06612, (0 missing)
##       X.1.0000.52 < 0.6143  to the left,  improve=73.48581, (0 missing)
##       X0.0790     < 0.0783  to the right, improve=72.25580, (0 missing)
##       X.0.9714    < -0.7571 to the left,  improve=49.16143, (0 missing)
##       X1.0000.6   < 0.1691  to the right, improve=47.59252, (0 missing)
##   Surrogate splits:
##       X.0.9714    < -0.2143 to the left,  agree=0.785, adj=0.345, (0 split)
##       X.1.0000.52 < 0.9143  to the left,  agree=0.702, adj=0.092, (0 split)
##       X.0.6000    < 0.7272  to the left,  agree=0.680, adj=0.023, (0 split)
##       X.0.2000    < -0.7254 to the right, agree=0.679, adj=0.021, (0 split)
##       X0.0790     < 0.9622  to the left,  agree=0.677, adj=0.016, (0 split)
## 
## Node number 45: 524 observations,    complexity param=0.01750875
##   predicted class=9   expected loss=0.6335878  P(node) =0.08401475
##     class counts:     3     0     0     0     0     4     0     0   192     0     0    19     5     1   144     0     0   122     0     0     0     0    29     2     3     0
##    probabilities: 0.006 0.000 0.000 0.000 0.000 0.008 0.000 0.000 0.366 0.000 0.000 0.036 0.010 0.002 0.275 0.000 0.000 0.233 0.000 0.000 0.000 0.000 0.055 0.004 0.006 0.000 
##   left son=90 (254 obs) right son=91 (270 obs)
##   Primary splits:
##       X0.0790   < 0.2656  to the right, improve=90.34397, (0 missing)
##       X1.0000.6 < 0.0704  to the right, improve=73.59495, (0 missing)
##       X1.0000.7 < 0.0746  to the right, improve=72.13282, (0 missing)
##       X1.0000.5 < 0.2708  to the right, improve=52.15184, (0 missing)
##       X.0.9268  < -0.7195 to the right, improve=49.50023, (0 missing)
##   Surrogate splits:
##       X1.0000.6  < 0.0368  to the right, agree=0.828, adj=0.646, (0 split)
##       X1.0000.7  < -0.1719 to the right, agree=0.805, adj=0.598, (0 split)
##       X1.0000.5  < 0.0936  to the right, agree=0.788, adj=0.563, (0 split)
##       X1.0000.3  < 0.2066  to the right, agree=0.754, adj=0.492, (0 split)
##       X1.0000.11 < 0.2546  to the right, agree=0.742, adj=0.469, (0 split)
## 
## Node number 72: 348 observations
##   predicted class=2   expected loss=0.5402299  P(node) =0.05579606
##     class counts:     3   160     0    28   111     0     1     1     0     0     0     0     0     1     0     3     0     0     0     1    15    13     3     0     7     1
##    probabilities: 0.009 0.460 0.000 0.080 0.319 0.000 0.003 0.003 0.000 0.000 0.000 0.000 0.000 0.003 0.000 0.009 0.000 0.000 0.000 0.003 0.043 0.037 0.009 0.000 0.020 0.003 
## 
## Node number 73: 460 observations
##   predicted class=4   expected loss=0.6347826  P(node) =0.07375341
##     class counts:     4    38     0   168   104     0     4     0     0     1     3     0     0     4     0    37     1     0     0    13    34    42     4     0     1     2
##    probabilities: 0.009 0.083 0.000 0.365 0.226 0.000 0.009 0.000 0.000 0.002 0.007 0.000 0.000 0.009 0.000 0.080 0.002 0.000 0.000 0.028 0.074 0.091 0.009 0.000 0.002 0.004 
## 
## Node number 76: 956 observations,    complexity param=0.0156745
##   predicted class=7   expected loss=0.7730126  P(node) =0.1532788
##     class counts:     0    15    10    25    10     1   217     0     0    36     4     0     0     0     0   176    50     0     0   189    16   151    18     0     0    38
##    probabilities: 0.000 0.016 0.010 0.026 0.010 0.001 0.227 0.000 0.000 0.038 0.004 0.000 0.000 0.000 0.000 0.184 0.052 0.000 0.000 0.198 0.017 0.158 0.019 0.000 0.000 0.040 
##   left son=152 (507 obs) right son=153 (449 obs)
##   Primary splits:
##       X.0.4000    < 0.7     to the right, improve=46.90970, (0 missing)
##       X1.0000.27  < 0.3495  to the left,  improve=44.22357, (0 missing)
##       X.1.0000.50 < 0.9571  to the right, improve=40.72380, (0 missing)
##       X1.0000.28  < 0.3703  to the left,  improve=35.34098, (0 missing)
##       X1.0000.30  < 0       to the left,  improve=32.07147, (0 missing)
##   Surrogate splits:
##       X.1.0000.20 < -0.9148 to the left,  agree=0.804, adj=0.584, (0 split)
##       X.1.0000.48 < -0.9857 to the left,  agree=0.771, adj=0.512, (0 split)
##       X.1.0000.18 < -0.9903 to the left,  agree=0.766, adj=0.501, (0 split)
##       X.1.0000.3  < -0.9924 to the left,  agree=0.764, adj=0.497, (0 split)
##       X.1.0000.33 < -0.9975 to the left,  agree=0.764, adj=0.497, (0 split)
## 
## Node number 77: 235 observations
##   predicted class=26  expected loss=0.3744681  P(node) =0.03767837
##     class counts:     0     9     6     9     2     0     2     0     0     0     0     0     0     0     0     8     2     0     0    20     0    19    11     0     0   147
##    probabilities: 0.000 0.038 0.026 0.038 0.009 0.000 0.009 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.034 0.009 0.000 0.000 0.085 0.000 0.081 0.047 0.000 0.000 0.626 
## 
## Node number 88: 873 observations,    complexity param=0.02376188
##   predicted class=1   expected loss=0.7880871  P(node) =0.1399711
##     class counts:   185     5     0     4     6    22     0     6    36     4     8   184    48    70    72     3     0   103     1     1     0     2    33    15    65     0
##    probabilities: 0.212 0.006 0.000 0.005 0.007 0.025 0.000 0.007 0.041 0.005 0.009 0.211 0.055 0.080 0.082 0.003 0.000 0.118 0.001 0.001 0.000 0.002 0.038 0.017 0.074 0.000 
##   left son=176 (543 obs) right son=177 (330 obs)
##   Primary splits:
##       X0.0790   < 0.0854  to the right, improve=76.36535, (0 missing)
##       X1.0000.7 < 0.0201  to the right, improve=50.45657, (0 missing)
##       X0.5958   < 0.6467  to the right, improve=49.41710, (0 missing)
##       X1.0000.5 < 0.4168  to the right, improve=46.33057, (0 missing)
##       X1.0000.6 < 0.0779  to the right, improve=45.58592, (0 missing)
##   Surrogate splits:
##       X0.5958   < 0.348   to the right, agree=0.772, adj=0.397, (0 split)
##       X1.0000.3 < 0.2368  to the right, agree=0.764, adj=0.376, (0 split)
##       X1.0000.5 < 0.0569  to the right, agree=0.749, adj=0.336, (0 split)
##       X1.0000.2 < 0.3379  to the right, agree=0.741, adj=0.315, (0 split)
##       X1.0000.7 < -0.232  to the right, agree=0.727, adj=0.279, (0 split)
## 
## Node number 89: 426 observations,    complexity param=0.0103385
##   predicted class=13  expected loss=0.6244131  P(node) =0.06830207
##     class counts:    25     0     0     0     0     0     0     1     0     0     1    19   160   143     8     0     0     2     0     0     0     0    67     0     0     0
##    probabilities: 0.059 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.000 0.000 0.002 0.045 0.376 0.336 0.019 0.000 0.000 0.005 0.000 0.000 0.000 0.000 0.157 0.000 0.000 0.000 
##   left son=178 (361 obs) right son=179 (65 obs)
##   Primary splits:
##       X.1.0000.52 < 0.6857  to the left,  improve=68.83559, (0 missing)
##       X.0.6000    < 0.3764  to the left,  improve=40.76568, (0 missing)
##       X1.0000.1   < 0.248   to the right, improve=34.40573, (0 missing)
##       X.1.0000.51 < -0.2429 to the left,  improve=33.83566, (0 missing)
##       X1.0000.24  < -0.4582 to the right, improve=18.04496, (0 missing)
##   Surrogate splits:
##       X.1.0000.51 < 0.7     to the left,  agree=0.915, adj=0.446, (0 split)
##       X1.0000.1   < 0.0779  to the right, agree=0.908, adj=0.400, (0 split)
##       X.0.6000    < 0.4425  to the left,  agree=0.908, adj=0.400, (0 split)
##       X1.0000.2   < -0.2367 to the right, agree=0.869, adj=0.138, (0 split)
##       X0.8222     < -0.4346 to the right, agree=0.859, adj=0.077, (0 split)
## 
## Node number 90: 254 observations
##   predicted class=9   expected loss=0.3031496  P(node) =0.04072471
##     class counts:     3     0     0     0     0     3     0     0   177     0     0     3     4     1    24     0     0     9     0     0     0     0    25     2     3     0
##    probabilities: 0.012 0.000 0.000 0.000 0.000 0.012 0.000 0.000 0.697 0.000 0.000 0.012 0.016 0.004 0.094 0.000 0.000 0.035 0.000 0.000 0.000 0.000 0.098 0.008 0.012 0.000 
## 
## Node number 91: 270 observations
##   predicted class=15  expected loss=0.5555556  P(node) =0.04329004
##     class counts:     0     0     0     0     0     1     0     0    15     0     0    16     1     0   120     0     0   113     0     0     0     0     4     0     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.000 0.004 0.000 0.000 0.056 0.000 0.000 0.059 0.004 0.000 0.444 0.000 0.000 0.419 0.000 0.000 0.000 0.000 0.015 0.000 0.000 0.000 
## 
## Node number 152: 507 observations
##   predicted class=7   expected loss=0.6587771  P(node) =0.08128908
##     class counts:     0     1     0     2     4     1   173     0     0    31     3     0     0     0     0   138     9     0     0   113     2    13    17     0     0     0
##    probabilities: 0.000 0.002 0.000 0.004 0.008 0.002 0.341 0.000 0.000 0.061 0.006 0.000 0.000 0.000 0.000 0.272 0.018 0.000 0.000 0.223 0.004 0.026 0.034 0.000 0.000 0.000 
## 
## Node number 153: 449 observations,    complexity param=0.01017175
##   predicted class=22  expected loss=0.6926503  P(node) =0.07198974
##     class counts:     0    14    10    23     6     0    44     0     0     5     1     0     0     0     0    38    41     0     0    76    14   138     1     0     0    38
##    probabilities: 0.000 0.031 0.022 0.051 0.013 0.000 0.098 0.000 0.000 0.011 0.002 0.000 0.000 0.000 0.000 0.085 0.091 0.000 0.000 0.169 0.031 0.307 0.002 0.000 0.000 0.085 
##   left son=306 (189 obs) right son=307 (260 obs)
##   Primary splits:
##       X.1.0000.50 < 0.9285  to the right, improve=44.79498, (0 missing)
##       X.1.0000.49 < 0.9714  to the right, improve=40.48966, (0 missing)
##       X.1.0000.20 < -0.5899 to the right, improve=35.73415, (0 missing)
##       X.1.0000.35 < -0.1467 to the right, improve=31.73381, (0 missing)
##       X1.0000.6   < -0.0508 to the right, improve=19.73595, (0 missing)
##   Surrogate splits:
##       X.1.0000.49 < 0.7     to the right, agree=0.875, adj=0.704, (0 split)
##       X.0.4000    < -0.5571 to the right, agree=0.713, adj=0.317, (0 split)
##       X.1.0000.20 < -0.6108 to the right, agree=0.708, adj=0.307, (0 split)
##       X.1.0000.16 < -0.984  to the left,  agree=0.699, adj=0.286, (0 split)
##       X.1.0000.35 < -0.6617 to the right, agree=0.695, adj=0.275, (0 split)
## 
## Node number 176: 543 observations
##   predicted class=1   expected loss=0.6629834  P(node) =0.08706109
##     class counts:   183     5     0     2     4    20     0     6    32     4     8    29    36    54    33     3     0    12     1     1     0     2    31    15    62     0
##    probabilities: 0.337 0.009 0.000 0.004 0.007 0.037 0.000 0.011 0.059 0.007 0.015 0.053 0.066 0.099 0.061 0.006 0.000 0.022 0.002 0.002 0.000 0.004 0.057 0.028 0.114 0.000 
## 
## Node number 177: 330 observations
##   predicted class=12  expected loss=0.530303  P(node) =0.05291005
##     class counts:     2     0     0     2     2     2     0     0     4     0     0   155    12    16    39     0     0    91     0     0     0     0     2     0     3     0
##    probabilities: 0.006 0.000 0.000 0.006 0.006 0.006 0.000 0.000 0.012 0.000 0.000 0.470 0.036 0.048 0.118 0.000 0.000 0.276 0.000 0.000 0.000 0.000 0.006 0.000 0.009 0.000 
## 
## Node number 178: 361 observations
##   predicted class=13  expected loss=0.5595568  P(node) =0.05788039
##     class counts:    25     0     0     0     0     0     0     1     0     0     1    19   159   142     8     0     0     2     0     0     0     0     4     0     0     0
##    probabilities: 0.069 0.000 0.000 0.000 0.000 0.000 0.000 0.003 0.000 0.000 0.003 0.053 0.440 0.393 0.022 0.000 0.000 0.006 0.000 0.000 0.000 0.000 0.011 0.000 0.000 0.000 
## 
## Node number 179: 65 observations
##   predicted class=23  expected loss=0.03076923  P(node) =0.01042168
##     class counts:     0     0     0     0     0     0     0     0     0     0     0     0     1     1     0     0     0     0     0     0     0     0    63     0     0     0
##    probabilities: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.015 0.015 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.969 0.000 0.000 0.000 
## 
## Node number 306: 189 observations
##   predicted class=20  expected loss=0.6296296  P(node) =0.03030303
##     class counts:     0     0     0     1     0     0    41     0     0     4     1     0     0     0     0    28    28     0     0    70     1     9     1     0     0     5
##    probabilities: 0.000 0.000 0.000 0.005 0.000 0.000 0.217 0.000 0.000 0.021 0.005 0.000 0.000 0.000 0.000 0.148 0.148 0.000 0.000 0.370 0.005 0.048 0.005 0.000 0.000 0.026 
## 
## Node number 307: 260 observations
##   predicted class=22  expected loss=0.5038462  P(node) =0.04168671
##     class counts:     0    14    10    22     6     0     3     0     0     1     0     0     0     0     0    10    13     0     0     6    13   129     0     0     0    33
##    probabilities: 0.000 0.054 0.038 0.085 0.023 0.000 0.012 0.000 0.000 0.004 0.000 0.000 0.000 0.000 0.000 0.038 0.050 0.000 0.000 0.023 0.050 0.496 0.000 0.000 0.000 0.127
prp(tree1,type=5,extra=104,nn=TRUE,tweak=1.7)

y <- predict(tree1,method="class",newdata=dataTE)
preds <- c()
for(i in 1:length(y[,1])){
  t <- which.max(y[i,])
  preds <- c(preds,t)
}
tab <- table(preds,dataTE$X1.)
sum(diag(tab))/length(dataTE$X1.) #accuracy is again 0.0417, so it does not change
## [1] 0.04172015
###RF

rf.isolet=randomForest(X1.~.,data=dataTR,mtry=4,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.isolet)

varImpPlot(rf.isolet)

pred.isolet = predict(rf.isolet,newdata=dataTE)
pred.isolet <- round(pred.isolet)
tab <- table(pred.isolet,dataTE$X1.)
sum(diag(tab))/length(dataTE$X1.) #accuracy is the same as accuracy of a single tree
## [1] 0.01540436
rf.isolet=randomForest(X1.~.,data=dataTR,mtry=5,ntree=500,nodesize=5,trControl = trainControl("cv", number = 10))
plot(rf.isolet)

varImpPlot(rf.isolet)

pred.isolet = predict(rf.isolet,newdata=dataTE)
pred.isolet <- round(pred.isolet)
tab <- table(pred.isolet,dataTE$X1.)
sum(diag(tab))/length(dataTE$X1.) #accuracy decreased to 0.0154.
## [1] 0.02053915
###gbm

noftrees=100
depth=5
learning_rate=0.2
sampling_fraction=0.5


#boosting_model=gbm(X1.~.,distribution="multinomial", data=dataTR, n.trees = noftrees,interaction.depth = depth,cv.folds=10,class.stratify.cv=TRUE, 
#                   n.minobsinnode = 5, shrinkage =learning_rate,
#                   bag.fraction = sampling_fraction)
#boosting_model
#summary(boosting_model)

#preds = predict.gbm(boosting_model,newdata=dataTE,type="response",single.tree=FALSE)

#Conclusion

To conclude, we can see from the summaries of the model and the train-test phases that cross validation with 10 folds indeed produces correct estimations regarding choosing a model. Further, let me go over the advantages and disadvantages that we have learned on IE582 and I have observed while making this homework. Firstly, knn is a lazy learner and as the dimension of the data, i.e. the future number gets larger, it gets difficult to train a model and obtain predictions since we have to check the distance of a desired instance from every instance from the training set. Also, producing distance or proximity metrics are difficult while using knn especially on categoric data. Secondly, tree based learners are white-box learners meaning they have interpretability and they perform easily on data with missing features as well. They tend to be biased towards features with a high number of unique values compared to features with less variety in terms of levels. I love random forests, they perform incredibly well. They integrate the high variance low bias feature of the trees by training a lot of overfitted trees and then averaging or majority voting the trees created. At last, the current version of the gbm seems not to be performing well. The package states one to use at their own risk while using a categorical feature with levels more than two. Further, the package aborted numberous times while I was trying to make it work. This is why I always love to write my own functions and tune them knowing exactly what procedures are inside them. I will provide the links for the datasets I have utilized below, many thanks to Mustafa Hoca, Ilayda and dataset providers.

#References Mushroom Data - https://archive.ics.uci.edu/dataset/73/mushroom Large Scale Wave Energy Data - https://archive.ics.uci.edu/dataset/882/large-scale+wave+energy+farm Isolet Data - https://archive.ics.uci.edu/dataset/54/isolet Bankrupcy Data - https://archive.ics.uci.edu/dataset/572/taiwanese+bankruptcy+prediction Spambase Data - https://archive.ics.uci.edu/dataset/94/spambase Breast Cancer Data - https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data